2019-02-04: Development Meeting

Kacper, Ian, Mike H, Craig, Tim, Tommy, Sebastian

Development planning

Agenda:

  • Updates
  • Discussion
    • Q. What are the major user stories of the Whole Tale environment.
      • Provides interactive/active analysis environment (ala JH/Rstudio cloud)
      • Supports “best practices” publishing reproducibible computational researcher
        • Assumptions today: Jupyter or Rstudio environments, but the design is intended to support “any” environment (e.g., Xpra)
        • Publishing the environment (assumed to be a Dockerfile) along with code/data is one path
        • Validation/verification
      • Roundtripping: Publish to a remote repository (DataONE member, Dataverse member) and a researcher can re-launch/re-run that Tale
        • Read the frozen record from DataONE.
          • Caveat: data in WT can be by reference (DOI or URL)
        • Have not addressed the long-term question
      • Remixing: Researcher could/build on pieces of a Tale across 3 dimensions: environment, data, code
        • Could be seen as “porting”
      • Publish/review/verification:
    • Q. What would you do “we’re sorry, this won’t work?”
      • If it builds without errors, but doesn’t run as expected. It builds but doesn’t provide the same results.
      • If we have a “test.sh”
        • Relationship to badges
      • Notebooks: How do I verify that the output is the same? It’s complicated.
        • Assertions
        • Static notebook via nbconvert
        • Flagging the notebook – when you run this cell, this is what the value should be.
        • Or, best practices – write important output to disk?
    • Q. Dockerfiles/images/etc
    • Q. Provenance
      • Don’t pursue provenance for provenance sake, but do it for WT reproducibility – i.e., where Docker/etc fail.
      • Give greater longevity to what we’re already doing.
        • Run reprozip
          • Don’t teach anyone about it, don’t necessarily keep the reprozip output
          • Here are all of the versions of software that were running
      • Crystallography example – submitting a PDB file
        • If there’s something that could tell me the software I used.
        • Q. What is the information that a researcher needs to capture to submit their work
          • Make it easier
        • Q. Would a crystallographer use Whole Tale?
          • Hundreds of GB of data, fast
          • Lots of compute, lots of data
      • AJPS:
        • https://ajps.org/ajps-replication-policy/
      • Meeting notes
        • https://docs.google.com/document/d/16ufQdzNDYyDswy4IXmdV_OvraZmDPu2Nkgs2mCOsirk/edit#
      • Mockups
        • http://www.crc.nd.edu/~kfurse/wholetale-css-mock-up-sui_f2018/browse.html
      • Discussion of transparency vs reproducibility

Updates

Kacper:

Mike H.:

  • Fix for https/http issue with webdav.
  • Looking into deploying on GCE
    • Storage drivers are only meant for volume management. Host OS still needs to “know” how to mount special filesystem such as WebDAV, FUSE etc.

Craig:

  • Mainly PR review/testing, tail format design meetings
  • Tentative schedule for 3rd OSI WG call - 2/20
    • Discussion of Dockerfile as archival format
      • TM: What are the requirements?
        • Is what DataONE/Dataverse thinks people are doing true? Would they have a different opinion.
    • TT: Pull in RO/bdbag for formats
  • Scheduling 1st Social Science WG call (2/13 or 2/20)
  • Provenance “Task group” (meeting notes)[https://docs.google.com/document/d/16ufQdzNDYyDswy4IXmdV_OvraZmDPu2Nkgs2mCOsirk/edit#heading=h.wsv7z8ay723f]
  • Carpentry working workshop at NCSA – March 7-8
  • Peter Darch - personas (waiting for access to materials)

Mike L.:

  • Submitted a PR for various fixes to Run > Files > Tale Workspace
    • Renamed “Select Data…” to “Import Tale Data…”
    • Hid the “Home” selector temporarily on the “Import Tale Data” modal
    • Updated the “Import Tale Data” modal to read from the new /workspace endpoint
    • I am working on possibly tweaking this to change “Import Tale Data” modal to list/read workspace contents from the tale.workspaceId instead of the /workspace endpoint
  • Discussed more of the intricacies with “Copy on Launch”
  • Next steps:
    • Update Manage > Data to use the /dataset endpoint? #357
    • Copy on Launch? (API work)
    • Refactored Catalog View? (UI work)
      • Further planning needed with Kacper/Craig

Tommy:

  • Met with Kacper and Craig & discussed serialization format-working on P.o.C for exporting and importing
    • In case anyone is wondering-we’re serializing information as JSON-LD in a file, manifest.json
    • Have manifest.json 95% done
      • Need to fix a bug where intermediate folders aren’t being recorded
        • workspace/folder_a/folder_b/folder_c/myFile.py
          • workspace/folder_c/myFile.py
    • After folder fix, need to add the files to the zip file via ziputil.ZipGenerator
    • Note this doesn’t include the license file or top level readme
  • Would like to discuss best way to handle license files
    • We currently store some number of them in gwvolman
      • Should this be a table in mongo?
        • spdx/unique id
        • common name
  • Discuss Publisher endpoint
    • reference a set of licenses (should this be a csv?)
    • name
    • uri
    • description
    • POST (Add), PUT (Update), GET(retrieve)
  • Question about dataSet:
    • home data + registered data shows up here-this is desired, right?
      • No - Update branches
    • The set of all Tale files is in dataSet, or a union of other structures?
      • dataSet (only external data) + workspace

Sebastian/Ian:

  • No updates
  • Sent email to Agnieszka to check remaining available hours

Tim:

  • Questions:
    • Is there a project board somewhere showing the issues being worked on right now?
      • https://app.zenhub.com/workspaces/59ee2631bedf0a4d94326094/boards?repos=66147054
    • What is the process for steering, prioritizing, and scheduling work?
      • Broad planning happens at All-Hands meetings (every 6 months, meeting notes in Drive)
      • Translated into Git issues/Epics/dev work based on a series of dev planning meetings
      • Schedule dedicated meetings with PI subset for input if needed
      • Review progress/priority with PI team during Tues exec calls.
    • Is there a plan to enable Tales to be viewable as static HTML without running the underlying container (like static view of Jupyter notebooks in GitHub)?
      • Related-ish (this is where we render the metadata, but the Tale needs to be running to view)? https://github.com/whole-tale/dashboard/issues/349
      • Ala Github integration with the nbviewer.
      • Answer: No, but we can understand how this might be useful in some contexts (Jupyter, Rmarkdown), but without running them (e.g., nbconvert)
      • https://help.github.com/articles/working-with-jupyter-notebook-files-on-github/
      • Q. Is the lifetime of a Tale the lifetime of running the container – full usability?
    • Is it feasible to support the following user story: “As a researcher submitting my work for publication in a journal I want to be able to share my work in runnable form with anonymous reviewers by providing a link to a Tale in my cover letter to the editor.
      • Good link that works as long as the review process
      • Yes, it’s feasible, but there is some question about anonymity
        • Anonymous access – can we require them to login so we can track access, even though you won’t know who they were.
        • vs providing a token or something similar
      • Q. Is this the right use case? v Odum (non anonymous)