General
The CSV templates offer a pre-defined data format, which can be used to
import metadata into pyDataverse, and export from it.
They support all three Dataverse Software data-types: Dataverse collections, Datasets and Datafiles.
CSV is an open file format, and great for humans and for machines. It can be
opened with your Spreadsheet software and edited manually, or used by your
favoured programming language.
The CSV format can also work as an exchange format or kind of a bridge
between all kind of data formats and programming languages.
The CSV templates and the mentioned workflow below can be used especially for:
- Mass imports into a Dataverse installation: The data to be imported could ether be collected manually (e. g. digitization of paper works), or created by machines (coming from any data source you have).
- Data exchange: share pyDataverse data with any other system in an open, machine-readable format
The CSV templates are licensed under CC BY 4.0
Content
The templates don’t come empty. They are pre-filled with supportive information to get started.
Each row is one entry
- Column names: The attribute name for each column. You can add and remove columns as you want. The pre-filled columns are a recommendation, as they consist of all metadata for the specific data-type, and the most common internal fields for handling the workflow. This is the only row that’s not allowed to be deleted. There are three established prefixes so far (you can define your own if you want):
org.
: Organization specific information to handle the data workflow later on.
dv.
: Dataverse specific metadata, used for API uploads. Use the exact Dataverse software attribute name after the prefix, so the metadata gets imported properly.
alma.
: ALMA specific information
- Description: Description of the Dataverse software attribute. This row is for support purposes only, and must be deleted before usage.
- Attribute type: Describes the type of the attribute (
serial
, string
or numeric
). Strings can also be valid JSON strings to use more complex data structures. This row is for support purposes only, and must be deleted before usage.
- Example: Contains a concrete example. To start adding your own data, it is often good to get started by copying the example for it. This row is for support purposes only, and must be deleted before usage.
- Multiple:
TRUE
, if multiple entries are allowed (boolean). This row is for support purposes only, and must be deleted before usage.
- Sub-keys:
TRUE
, if sub-keys are part (boolean). Only applicable to JSON strings. This row is for support purposes only, and must be deleted before usage.
Usage
To use the CSV templates, we propose following steps as a best practice.
The workflow is the same for Dataverse collections, Datasets and Datafiles.
There is also a more detailed tutorial on how to use the CSV templates
for mass imports in the
User Guide - Advanced.
The CSV templates can be found in src/pyDataverse/templates/
(GitHub repo):
Adapt CSV template(s)
First, adapt the CSV templates to your own needs and workflow.
- Open a template file and save it: Just start by copying the file and changing its filename to something descriptive (e.g.
20200117_datasets.csv
).
- Adapt columns: Then change the pre-defined columns (attributes) to your needs.
- Add metadata: Add metadata in the first empty row. Closely following the example is often a good starting point, especially for JSON strings.
- Remove supporting rows: Once you are used to the workflow, you can delete the supportive rows 2 to 6. This must be done before you use the template for pyDataverse!
- Save and use: Once you have finished editing, save the CSV-file and import it to pyDataverse.
Use the CSV files
For further usage of the CSV files with pyDataverse, for example:
- adding metadata to the CSV files
- importing CSV files
- uploading data and metadata via API
… have a look at the Data Migration Tutorial.