2018-12-05: Tale Format design¶
Tommy, Kacper, Craig
- Tommy
- Looked at Dataverse main metadata file/DC vocab terms covering licenses (type/file), name, author, etc.
- Still shooting for an RDF representation
- Review of notes:
- Distinction between export and publishing
- TT: When we do publish, we will have some sort of metadata document
- Craig:
- 12/04 Dataverse community call covered publishing Git repos and file hierarchy support
- Long-standing issue to support Zenodo-like Github publication
- Gigantum, Binder, WT talking about publishing to them
- Phase 1 recommendation is to publish Git repos as zipfiles
- Gigantum is publishing Git repos to dataverse as zipfiles to preserve hierarchy because Dataverse does not support file hierarchy.
- Re-iterate the Odum CoRE2 case, which is adding a Dockerfile to an existing Dataverse collection/dataset as a way to provide the environment
- 12/04 Dataverse community call covered publishing Git repos and file hierarchy support
- Discussion
- TT: DC relates to the document we need to create for the DVN SWORD API
- http://guides.dataverse.org/en/latest/api/sword.html#create-a-dataset-with-an-atom-entry
- KK: They support adding dataset as zip – zip is unpacked, zipped-zip is published as a zipfile
- CW: RDF is for internal structure which gets translated to each publisher?
- TT: may not have a 1:1 mapping
- CW:
- Publish metadata to DV/DataONE in their terms/format (i.e., their metadata) for discovery, etc.
- Facilitates discovery – maximizes the value of integration with their system
- Publishing “tale.rdf” or “tale.yaml” allows us to know what we’re dealing with – serialize/deserialization export/import
- Spectrum of “tales” that might be in Dataverse/DataONE
- Created outside of WT
- Data + Code and no environment
- Dockerfile + Data + code
- Binder-like structure
- We would use the DataONE/DV metadata during tale import
- Tale – tale.rdf/etc
- Can tales be run elsewhere?
- What we have that is different is data handling and mounting
- Created outside of WT
- Publish metadata to DV/DataONE in their terms/format (i.e., their metadata) for discovery, etc.
- Q. What does DataONE publish format look like today?
- Data package has 4? objects:
- EML document: high-level overview of what’s in the package, used by Metacat UI
- Datafile(s): mydata.txt
- ORE/RDF: describes the relationships between files. Package doesn’t exist without this file.
- Metadata document for every file
- Discussion of DataONE API:
- https://releases.dataone.org/online/api-documentation-v2.0/apis/index.html
- Data package has 4? objects:
- Q. Why don’t we publish a zipfile to DataONE?
- TT to follow-up – quality/stigma, etc.
- Q. Publish to Zenodo via GitHub is always a zipfile?
- Yes
- Q. Do we need to be able to handle zipped repos?
- Yes, if we want to be able to run Binders, Gigantums
- TT: DC relates to the document we need to create for the DVN SWORD API
- Taking for granted that we publish a “tale.yml/rdf” file in the package to each repo
- Primary purpose is for us to read it back in.
- Q. Why go through the process of RDF-izing it if we’re the only ones using
- Why deserialize RDF to put things into Girder in JSON?
- Cost/value?
- RDF is expensive to create – making sure that things are mapping
- URI space + resolution – hosting a service to handle concepts
- YAML is cheap and easy but non-standard
- schema.org
- DC is non-controversial / low hanging
- Mapping “tale.yaml” into DC/Schema.org space is super easy
- Provenance
- Q. How is provenance stored in DataONE?
- Always added to ORE document
- RDF/XML
- Q. Is provenance related to the file or all files in the dataset
- Provenance defined between two objects in a dataset
- https://www.dataone.org/best-practices/provenance
- Always added to ORE document
- Q. In Dataverse?
- Provenance file or text-free prov description?
- http://guides.dataverse.org/en/latest/api/native-api.html?highlight=provenance#provenance
- JSON format
- http://guides.dataverse.org/en/latest/user/dataset-management.html#data-provenance
- Provenance file or text-free prov description?
- Q. Do we really need provenance information included in the “tale.yml”
- Q. What does WT do about provenance info?
- Use case: Diffing prov between runs
- Q. How is provenance stored in DataONE?
- Gist:
- Dublin Core/schema.org
- Provenance
- External data