2019-01-14: Development Meeting

Kacper, Tommy, Mike L, Mike H, Craig, Adam

Development planning

Agenda:

  • Updates
  • Cleanup
    • Close PR https://github.com/whole-tale/dashboard/pull/341?
    • Rename PR https://github.com/whole-tale/dashboard/pull/347
  • Mike H task options: 1) Copy on Launch or 2) Kubernetes
    • KK: We’re doing certain things with Fuse, Girderfs, WebDav to mount filesystems that are not particularly container-orchestration friendly.
      • If we migrate to Kubernetes, there may be a better way to do what we’re doing.
  • Publishing design
    • TT
      • What we really need is the manifest.json file
      • Local exports would be identified by location in the bag
    • CW:
      • Q. Should we snapshot? How do we make sure the manifest is representative of what is in the workspace (unchanged)?
      • KK: We need to version or tag anyway – to stage for publishing
      • Q. Where would DataONE and Dataverse publish live?
        • KK: Abstract common operations into Girder WholeTale plugin. Move provider specific code into separate plugins.
        • e.g., abstract exporter in WT
          • Create metadata.json
          • Stage workspace, etc
        • Have a plugin that depends on that. Expand on the plugin side.
        • Zenodo may use the zip one, doing nothing but providing some metadata.
        • We need to come up with all of the specific things that integrating with external providers requires
          • e.g., consuming tale, publishing tale, registering data
            • Common plugin framework
            • e.g., dataverse integration endpoint
      • Q. What would publish to Dataverse look like:
        • server/lib/dataverse/publish.py
          • Implements DataversePublishProvider
          • Based on PublishedProvider
            • Create client: connect to DataOne/Dataverse
            • Generate publisher metadata
              • Dataverse - SWORD XML
              • DataONE - EML + ResourceMap
            • Generate Tale metadata (tale.yaml/manifest.json)
            • Uploading a file (in general upload Girder Object to Publisher)
          • Discussion of
            • https://github.com/whole-tale/gwvolman/blob/master/gwvolman/publish.py#L638
          • KK: With Girder3, when you create a plugin (e.g., DataONE plugin), it can have 3 parts – UI changes, server changes, celery_worker plugins.
      • Q. MH: What are the problems?
        • Might just put it into a zipfile
        • Results might keep as a separate file
        • There’s a structure imposed by the publisher side

Updated

  • Kacper:
    • Modification to User Data handling wholetale#205
      • ‘My datasets’ view could be implemented with this.
    • Closer integration with Sessions:
      • https://hackmd.hub.yt/v_HkNg5FTsWT-prMY1PHkw
      • New plugin: globus_handler to avoid circular dependencies between wholetale and wt_data_manager plugins
      • Allow to create a Session using just taleId (WIP on wt_data_manager)
      • Converted involatileData to dataSet branch:invdata_to_dataset
  • Adam:
    • No updates
    • We should merge outstanding PRs and file separate issue
  • Tommy:
    • Currently creating Tales with the bagit-ro spec
      • LIGO_Tutorial
      • Replication of Machine Learning Research: Mullainathan
      • [Globus Data Tale]
        • Source for Logan’s tale: https://github.com/whole-tale/Discover_MG_CoVZr
      • Majority of work is writing the manifest.json
    • Testing the dataone URL changes
      • Data registration
      • Publishing
      • ORCID Login & Redirect
    • Write docs for adding a publisher to gwvolman
      • Parallel to adding an import provider
      • Design for what publishing looks like in the context of the revised format
    • Reviews
      • https://github.com/whole-tale/dashboard/pull/341
      • https://github.com/whole-tale/dashboard/pull/339
    • Re-check with Bryce/Matt about the possibility of publishing bag archives to DataONE
      • Main issue is that we lose the URI space for all of the internal files
        • Can’t show prov relations of files in the bag
        • Won’t be able to create an in depth EML doc (effects the UI and searching)
  • Craig:
    • NSF report material
    • 36 month review material
    • Tale format design with Tommy
    • repo2docker.version PR WIP
    • Change build_tale_image to use repo2docker WIP
    • Add /build/{id}/tale endpoint WIP
    • Things to note:
      • Will no longer build Image objects (images become metadata-only), Recipe is obsolete. Lots of surgery here.
      • Will impact publishing and Tale format
        • Tommy: I’m pulling info out of the recipe data structure, it shouldn’t be a large issue handling it.
    • Open issues:
      • How to handle iframes
      • When to build the image for a Tale
      • How to know the workspace changed (git commit hash)
      • Do we build the workspace into the image?
      • How to handle versioning repo2docker
  • Mike H:
    • Worked on session caching in girderfs, PR issued
  • Mike L:
    • Spent most of the past week sick / recovering
      • Sorry - I am very behind on reviews
    • Had more discussions with Craig/Kacper about the Tale image rebuild functionality
    • Almost ready to PR the gwvolman / REST endpoints for changing the running Tale image and restarting the instance
      • Still fixing last few issues found in testing
        • I think something is angry that I changed the schema here, so instances are refusing to launch or restart
      • Some confusion/conflation between “update instance” and “restart instance”
        • Someone (Kacper? Craig?) expressed interest in a more generic “update instance” call, so I renamed my methods from restartInstance to updateInstance throughout
        • It turns out that there is already a generic “updateInstance” that is used internally but not exposed as an API endpoint which was clobbered by mine - this is one case where instances failed to launch (see above)
        • ~~My question: Can the genericized “update instance” be a separate issue while I focus on wiring up the restart? Otherwise, this creates many more questions about detecting when to / when not to update the Docker service which have yet not been discussed~~