2019-02-25: Development¶
Mike H, Mike L, Craig, Tommy, Tim Regrets: Kacper
Development planning¶
- Updates
- Discussions
- Tommy’s Demo
- Bug in file selection
- Entrypoint – we need to support it or remove it
- Home shouldn’t be here
- Why Data here? Because you can select a subset of files used
- Demo of using prov editor during package publishing
- Publish with DOI
- CW: What happens if I don’t hit publish and want to change things in Whole Tale?
- TT: There could be a way to save Tale state
- Demo of manifest.json
- CW: Question about license
- Publishing location on metadata
- TT: There is a URI for the package even without a DOI
- Discussion of target/licenses: both are still hardcoded in UI
- MH: About Datasets
- If the dataset is very large, it may make sense to allow subselect
- Data manager tracks what’s used is true
- CW: Why have two different selection mechanisms?
- MH: If the Tale is exploratory, you might not reference all of the datasets that you’ve selected.
- Discussion:
- Whether selection happens before (via Dashboard) or at point of publication
- MH: Science is a mess – you don’t have it organized up front
- We should support the “Ball of Mud”
- You don’t rebuild each time.
- TM: I like the iterative environment, but how likely is it that you can pull out the right stuff – that it’s runnable
- If you support subselection, how to you know at the point of publication that you’ve got the right stuff?
- MH: This is the point of validation
- CW: But there’s been some confusion over where this occurs.
- TM: This is the branch model – branch, delete what shouldn’t be in the release, delete what shouldn’t be there, now this is the release. If I want to publish a new version of that release, I can start from there, create another version from there.
- Trunk is the ball of mud
- MH: Whether you select or deselect before publishing, you still need to do validation
- The ability to start the container from scratch, build the “clean” environment
- TT: Ties into provenance capture
- There was the idea of capturing provenance from “run.sh” during the validate step
- ML: What is the difference between the home folder and the Tale workspace?
- Tim’s case:
- SKOPE use case:
- 10s of GB input and output
- Kyle’s PaleoCar example
- People thought we’d be able to use WT as a processing environment to get data into the right form for use in SKOPE
- What if users in SKOPE wanted to re-run an analysis or can they see the Tale that represents the re-processing
- SKOPE use case:
- Tommy’s Demo
Updates¶
- Craig:
- Non-development tasks:
- EAB meeting
- Need to get more real Tales into system
- Need to get DataONE and Export done
- EAB meeting
- Development tasks:
- repo2docker integration
- rstudio-base “WholeTale” breadcrumb – patch RStudio server
- CSP_HOSTS for iframes
- repo2docker integration
- Non-development tasks:
- Mike H:
- Still working on Kubernetes storage drivers, but has reached the three weeks. What’s next?
- Q. How long do you think it would take to implement the GirderFS driver?
- MH: Today, we mount GirderFS on the host and then in the container as a host filesystem
- New issue – implement Flexvolume driver for girderfs
- Mike L:
- Bug fixes for v0.6
- Looking at breadcrumbs problem
- Open PRs:
- https://github.com/whole-tale/dashboard/pull/424
- https://github.com/whole-tale/dashboard/pull/418
- Tim:
- I can start exercising the system as soon as I get git capability at a bash prompt.
- Where should I store data used by Tales if the data (consumed and produced) is a few 100 GB in size?
- https://github.com/openskope/skope-notebooks/blob/master/download_lbda_v2_dataset_from_noaa.ipynb
- What is the expected user experience of repo2docker in Whole Tale?
- Tommy:
- Craig’s note
- dataone_publishing on Dashboard)
- gwvolman dataone_publishing
- tale_Export girder_wholetale
- Added Published URL to Tale metadata view so that users don’t need to open the publishing modal to see the location
- Added support for the refactored Workspace & Data structure to File Picker
- Refactored manifest endpoint to optionally generate a manifest from a list of item IDs
- Refactored DataONE publishing code to use the manifest endpoint rather than generating the tale yaml
- When uploading data to DataONE I added a check to make sure the mimetype is supported
- Fixed a file size format annoyance in the publishing-modal
- Improved error messages in publishing-modal (using job/result endpoint)
- Made fixes to DataONE Python Library calls that were renamed during a library ubdate
- Fixed a bug where duplicate Dataset entries were being added to the manifest
- Currently writing a doc on all of the publishing features, implementation, and how to test. See it here
- Next:
- Auto open publishing modal after ORCID login (add queryParam to /run route)
- Fix file check marks missing in file-picker
- Auto select Tale files in file-picker (I disabled this because it was confusing with the current bug where you can’t see what is selected)
- Touch bases with Craig about publishing features-what we have and what we want, end goals, scope, etc
- Discuss replacing parent_dataset with schema:isPartOf or dc:isPartOf in
manifest.json
- Are we allowing people to select files out of
Home
when publishing?- No
- Should any files in Home that are used in a Tale get copied to the workspace?
- Use cases:
- User creates a tale and only uses their home directory. The user shares the Tale and the recipient doesn’t see anything
- User creates a hybrid tale: some registered data, and analysis scripts from home. User makes the Tale public-people can only see the registered data
- Use cases:
- Craig’s note