2019-02-25: Development

Mike H, Mike L, Craig, Tommy, Tim Regrets: Kacper

Development planning

  • Updates
  • Discussions
    • Tommy’s Demo
      • Bug in file selection
      • Entrypoint – we need to support it or remove it
      • Home shouldn’t be here
      • Why Data here? Because you can select a subset of files used
      • Demo of using prov editor during package publishing
      • Publish with DOI
        • CW: What happens if I don’t hit publish and want to change things in Whole Tale?
        • TT: There could be a way to save Tale state
      • Demo of manifest.json
        • CW: Question about license
      • Publishing location on metadata
        • TT: There is a URI for the package even without a DOI
      • Discussion of target/licenses: both are still hardcoded in UI
      • MH: About Datasets
        • If the dataset is very large, it may make sense to allow subselect
        • Data manager tracks what’s used is true
      • CW: Why have two different selection mechanisms?
        • MH: If the Tale is exploratory, you might not reference all of the datasets that you’ve selected.
      • Discussion:
        • Whether selection happens before (via Dashboard) or at point of publication
        • MH: Science is a mess – you don’t have it organized up front
          • We should support the “Ball of Mud”
          • You don’t rebuild each time.
        • TM: I like the iterative environment, but how likely is it that you can pull out the right stuff – that it’s runnable
          • If you support subselection, how to you know at the point of publication that you’ve got the right stuff?
          • MH: This is the point of validation
          • CW: But there’s been some confusion over where this occurs.
          • TM: This is the branch model – branch, delete what shouldn’t be in the release, delete what shouldn’t be there, now this is the release. If I want to publish a new version of that release, I can start from there, create another version from there.
            • Trunk is the ball of mud
          • MH: Whether you select or deselect before publishing, you still need to do validation
            • The ability to start the container from scratch, build the “clean” environment
        • TT: Ties into provenance capture
          • There was the idea of capturing provenance from “run.sh” during the validate step
        • ML: What is the difference between the home folder and the Tale workspace?
    • Tim’s case:
      • SKOPE use case:
        • 10s of GB input and output
        • Kyle’s PaleoCar example
        • People thought we’d be able to use WT as a processing environment to get data into the right form for use in SKOPE
        • What if users in SKOPE wanted to re-run an analysis or can they see the Tale that represents the re-processing

Updates

  • Craig:
    • Non-development tasks:
      • EAB meeting
        • Need to get more real Tales into system
        • Need to get DataONE and Export done
    • Development tasks:
      • repo2docker integration
        • rstudio-base “WholeTale” breadcrumb – patch RStudio server
        • CSP_HOSTS for iframes
  • Mike H:
    • Still working on Kubernetes storage drivers, but has reached the three weeks. What’s next?
    • Q. How long do you think it would take to implement the GirderFS driver?
      • MH: Today, we mount GirderFS on the host and then in the container as a host filesystem
      • New issue – implement Flexvolume driver for girderfs
  • Mike L:
    • Bug fixes for v0.6
    • Looking at breadcrumbs problem
    • Open PRs:
      • https://github.com/whole-tale/dashboard/pull/424
      • https://github.com/whole-tale/dashboard/pull/418
  • Tim:
    • I can start exercising the system as soon as I get git capability at a bash prompt.
    • Where should I store data used by Tales if the data (consumed and produced) is a few 100 GB in size?
      • https://github.com/openskope/skope-notebooks/blob/master/download_lbda_v2_dataset_from_noaa.ipynb
    • What is the expected user experience of repo2docker in Whole Tale?
  • Tommy:
    • Craig’s note
      • dataone_publishing on Dashboard)
      • gwvolman dataone_publishing
      • tale_Export girder_wholetale
    • Added Published URL to Tale metadata view so that users don’t need to open the publishing modal to see the location
    • Added support for the refactored Workspace & Data structure to File Picker
    • Refactored manifest endpoint to optionally generate a manifest from a list of item IDs
    • Refactored DataONE publishing code to use the manifest endpoint rather than generating the tale yaml
    • When uploading data to DataONE I added a check to make sure the mimetype is supported
    • Fixed a file size format annoyance in the publishing-modal
    • Improved error messages in publishing-modal (using job/result endpoint)
    • Made fixes to DataONE Python Library calls that were renamed during a library ubdate
    • Fixed a bug where duplicate Dataset entries were being added to the manifest
    • Currently writing a doc on all of the publishing features, implementation, and how to test. See it here
    • Next:
      • Auto open publishing modal after ORCID login (add queryParam to /run route)
      • Fix file check marks missing in file-picker
      • Auto select Tale files in file-picker (I disabled this because it was confusing with the current bug where you can’t see what is selected)
      • Touch bases with Craig about publishing features-what we have and what we want, end goals, scope, etc
      • Discuss replacing parent_dataset with schema:isPartOf or dc:isPartOf in manifest.json
      • Are we allowing people to select files out of Home when publishing?
        • No
        • Should any files in Home that are used in a Tale get copied to the workspace?
          • Use cases:
            • User creates a tale and only uses their home directory. The user shares the Tale and the recipient doesn’t see anything
            • User creates a hybrid tale: some registered data, and analysis scripts from home. User makes the Tale public-people can only see the registered data