2019-02-04: Development Meeting¶
Kacper, Ian, Mike H, Craig, Tim, Tommy, Sebastian
Development planning¶
Agenda:
- Updates
- Discussion
- Q. What are the major user stories of the Whole Tale environment.
- Provides interactive/active analysis environment (ala JH/Rstudio cloud)
- Supports “best practices” publishing reproducibible computational researcher
- Assumptions today: Jupyter or Rstudio environments, but the design is intended to support “any” environment (e.g., Xpra)
- Publishing the environment (assumed to be a Dockerfile) along with code/data is one path
- Validation/verification
- Roundtripping: Publish to a remote repository (DataONE member, Dataverse member) and a researcher can re-launch/re-run that Tale
- Read the frozen record from DataONE.
- Caveat: data in WT can be by reference (DOI or URL)
- Have not addressed the long-term question
- Read the frozen record from DataONE.
- Remixing: Researcher could/build on pieces of a Tale across 3 dimensions: environment, data, code
- Could be seen as “porting”
- Publish/review/verification:
- Q. What would you do “we’re sorry, this won’t work?”
- If it builds without errors, but doesn’t run as expected. It builds but doesn’t provide the same results.
- If we have a “test.sh”
- Relationship to badges
- Notebooks: How do I verify that the output is the same? It’s complicated.
- Assertions
- Static notebook via nbconvert
- Flagging the notebook – when you run this cell, this is what the value should be.
- Or, best practices – write important output to disk?
- Q. Dockerfiles/images/etc
- Q. Provenance
- Don’t pursue provenance for provenance sake, but do it for WT reproducibility – i.e., where Docker/etc fail.
- Give greater longevity to what we’re already doing.
- Run reprozip
- Don’t teach anyone about it, don’t necessarily keep the reprozip output
- Here are all of the versions of software that were running
- Run reprozip
- Crystallography example – submitting a PDB file
- If there’s something that could tell me the software I used.
- Q. What is the information that a researcher needs to capture to submit their work
- Make it easier
- Q. Would a crystallographer use Whole Tale?
- Hundreds of GB of data, fast
- Lots of compute, lots of data
- AJPS:
- https://ajps.org/ajps-replication-policy/
- Meeting notes
- https://docs.google.com/document/d/16ufQdzNDYyDswy4IXmdV_OvraZmDPu2Nkgs2mCOsirk/edit#
- Mockups
- http://www.crc.nd.edu/~kfurse/wholetale-css-mock-up-sui_f2018/browse.html
- Discussion of transparency vs reproducibility
- Q. What are the major user stories of the Whole Tale environment.
Updates¶
Kacper:
- Fixed Dataverse registration logic whole-tale/girder_wholetale#223
- We were querying a Dataverse’s service to get a list of all DV installations.
- Service was suddenly decomissioned IQSS/dataverse#5497
- Fixed File/folder icon display in external data selection, by exposing
_modelType
indataSet
- Fixed Data panel displaying only 50 folders/items whole-tale/dashboard#370
- Participated in Tale Serialization format discussion (w/ Tommy & Craig)
- Outstanding PRs:
- Handling Workspaces in “Workspace modal” (display Tale title rather than workspace name, which is an ugly uuid):
- Add ‘Workspace’ resource whole-tale/girder_wholetale/#226
- Expose
workspaceId
in Tale model whole-tale/girder_wholetale/#228
- Allow to filter images by tag whole-tale/girder_wholetale/#220 as potential way to promote “blessed” environments.
- Better error handling in Dashboard whole-tale/dashboard#373
- Handling Workspaces in “Workspace modal” (display Tale title rather than workspace name, which is an ugly uuid):
Mike H.:
- Fix for https/http issue with webdav.
- Looking into deploying on GCE
- Storage drivers are only meant for volume management. Host OS still needs to “know” how to mount special filesystem such as WebDAV, FUSE etc.
Craig:
- Mainly PR review/testing, tail format design meetings
- Tentative schedule for 3rd OSI WG call - 2/20
- Discussion of Dockerfile as archival format
- TM: What are the requirements?
- Is what DataONE/Dataverse thinks people are doing true? Would they have a different opinion.
- TM: What are the requirements?
- TT: Pull in RO/bdbag for formats
- Discussion of Dockerfile as archival format
- Scheduling 1st Social Science WG call (2/13 or 2/20)
- Provenance “Task group” (meeting notes)[https://docs.google.com/document/d/16ufQdzNDYyDswy4IXmdV_OvraZmDPu2Nkgs2mCOsirk/edit#heading=h.wsv7z8ay723f]
- Carpentry working workshop at NCSA – March 7-8
- Peter Darch - personas (waiting for access to materials)
Mike L.:
- Submitted a PR for various fixes to Run > Files > Tale Workspace
- Renamed “Select Data…” to “Import Tale Data…”
- Hid the “Home” selector temporarily on the “Import Tale Data” modal
- Updated the “Import Tale Data” modal to read from the new
/workspace
endpoint - I am working on possibly tweaking this to change “Import Tale Data” modal to list/read workspace contents from the
tale.workspaceId
instead of the/workspace
endpoint
- Discussed more of the intricacies with “Copy on Launch”
- Next steps:
- Update Manage > Data to use the
/dataset
endpoint? #357 - Copy on Launch? (API work)
- Refactored Catalog View? (UI work)
- Further planning needed with Kacper/Craig
- Update Manage > Data to use the
Tommy:
- Met with Kacper and Craig & discussed serialization format-working on P.o.C for exporting and importing
- In case anyone is wondering-we’re serializing information as JSON-LD in a file,
manifest.json
- Have manifest.json 95% done
- Need to fix a bug where intermediate folders aren’t being recorded
- workspace/folder_a/folder_b/folder_c/myFile.py
- workspace/folder_c/myFile.py
- workspace/folder_a/folder_b/folder_c/myFile.py
- Need to fix a bug where intermediate folders aren’t being recorded
- After folder fix, need to add the files to the zip file via
ziputil.ZipGenerator
- Note this doesn’t include the license file or top level readme
- In case anyone is wondering-we’re serializing information as JSON-LD in a file,
- Would like to discuss best way to handle license files
- We currently store some number of them in gwvolman
- Should this be a table in mongo?
- spdx/unique id
- common name
- Should this be a table in mongo?
- We currently store some number of them in gwvolman
- Discuss Publisher endpoint
- reference a set of licenses (should this be a csv?)
- name
- uri
- description
- POST (Add), PUT (Update), GET(retrieve)
- Question about dataSet:
- home data + registered data shows up here-this is desired, right?
- No - Update branches
- The set of all Tale files is in dataSet, or a union of other structures?
- dataSet (only external data) + workspace
- home data + registered data shows up here-this is desired, right?
Sebastian/Ian:
- No updates
- Sent email to Agnieszka to check remaining available hours
Tim:
- Questions:
- Is there a project board somewhere showing the issues being worked on right now?
- https://app.zenhub.com/workspaces/59ee2631bedf0a4d94326094/boards?repos=66147054
- What is the process for steering, prioritizing, and scheduling work?
- Broad planning happens at All-Hands meetings (every 6 months, meeting notes in Drive)
- Translated into Git issues/Epics/dev work based on a series of dev planning meetings
- Schedule dedicated meetings with PI subset for input if needed
- Review progress/priority with PI team during Tues exec calls.
- Is there a plan to enable Tales to be viewable as static HTML without running the underlying container (like static view of Jupyter notebooks in GitHub)?
- Related-ish (this is where we render the metadata, but the Tale needs to be running to view)? https://github.com/whole-tale/dashboard/issues/349
- Ala Github integration with the nbviewer.
- Answer: No, but we can understand how this might be useful in some contexts (Jupyter, Rmarkdown), but without running them (e.g., nbconvert)
- https://help.github.com/articles/working-with-jupyter-notebook-files-on-github/
- Q. Is the lifetime of a Tale the lifetime of running the container – full usability?
- Is it feasible to support the following user story: “As a researcher submitting my work for publication in a journal I want to be able to share my work in runnable form with anonymous reviewers by providing a link to a Tale in my cover letter to the editor.”
- Good link that works as long as the review process
- Yes, it’s feasible, but there is some question about anonymity
- Anonymous access – can we require them to login so we can track access, even though you won’t know who they were.
- vs providing a token or something similar
- Q. Is this the right use case? v Odum (non anonymous)
- Is there a project board somewhere showing the issues being worked on right now?