CSV Templates

General

The CSV templates offer a pre-defined data format, which can be used to import metadata into pyDataverse, and export from it. They support all three Dataverse Software data-types: Dataverse collections, Datasets and Datafiles.

CSV is an open file format, and great for humans and for machines. It can be opened with your Spreadsheet software and edited manually, or used by your favoured programming language.

The CSV format can also work as an exchange format or kind of a bridge between all kind of data formats and programming languages.

The CSV templates and the mentioned workflow below can be used especially for:

  • Mass imports into a Dataverse installation: The data to be imported could ether be collected manually (e. g. digitization of paper works), or created by machines (coming from any data source you have).
  • Data exchange: share pyDataverse data with any other system in an open, machine-readable format

The CSV templates are licensed under CC BY 4.0

Data format

  • Seperator: ,
  • Encoding: utf-8
  • Quotation: ". Note: In JSON strings, you have to escape with \ before a quotation mark (e. g. adapt " to \").
  • Boolean: we recommend using TRUE and FALSE as boolean values. Note: They can be modified, when you open it with your preferred spreadsheet software (e. g. Libre Office), depending on the software or your operating systems settings.

Content

The templates don’t come empty. They are pre-filled with supportive information to get started. Each row is one entry

  1. Column names: The attribute name for each column. You can add and remove columns as you want. The pre-filled columns are a recommendation, as they consist of all metadata for the specific data-type, and the most common internal fields for handling the workflow. This is the only row that’s not allowed to be deleted. There are three established prefixes so far (you can define your own if you want):
  1. org.: Organization specific information to handle the data workflow later on.
  2. dv.: Dataverse specific metadata, used for API uploads. Use the exact Dataverse software attribute name after the prefix, so the metadata gets imported properly.
  3. alma.: ALMA specific information
  1. Description: Description of the Dataverse software attribute. This row is for support purposes only, and must be deleted before usage.
  2. Attribute type: Describes the type of the attribute (serial, string or numeric). Strings can also be valid JSON strings to use more complex data structures. This row is for support purposes only, and must be deleted before usage.
  3. Example: Contains a concrete example. To start adding your own data, it is often good to get started by copying the example for it. This row is for support purposes only, and must be deleted before usage.
  4. Multiple: TRUE, if multiple entries are allowed (boolean). This row is for support purposes only, and must be deleted before usage.
  5. Sub-keys: TRUE, if sub-keys are part (boolean). Only applicable to JSON strings. This row is for support purposes only, and must be deleted before usage.

Usage

To use the CSV templates, we propose following steps as a best practice. The workflow is the same for Dataverse collections, Datasets and Datafiles.

There is also a more detailed tutorial on how to use the CSV templates for mass imports in the User Guide - Advanced.

The CSV templates can be found in src/pyDataverse/templates/ (GitHub repo):

Adapt CSV template(s)

First, adapt the CSV templates to your own needs and workflow.

  1. Open a template file and save it: Just start by copying the file and changing its filename to something descriptive (e.g. 20200117_datasets.csv).
  2. Adapt columns: Then change the pre-defined columns (attributes) to your needs.
  3. Add metadata: Add metadata in the first empty row. Closely following the example is often a good starting point, especially for JSON strings.
  4. Remove supporting rows: Once you are used to the workflow, you can delete the supportive rows 2 to 6. This must be done before you use the template for pyDataverse!
  5. Save and use: Once you have finished editing, save the CSV-file and import it to pyDataverse.

Use the CSV files

For further usage of the CSV files with pyDataverse, for example:

  • adding metadata to the CSV files
  • importing CSV files
  • uploading data and metadata via API

… have a look at the Data Migration Tutorial.

Export from pyDataverse

If you want to export your metadata from a pyDataverse object ( Dataverse, Dataset, Datafile) to a CSV file:

  1. Get the metadata as dict (Dataverse.get(), Dataset.get() or Datafile.get()).
  2. Pass the dict to write_dicts_as_csv(). Note: Use the internal attribute lists from pyDataverse.models to get a complete list of fieldnames for each Dataverse data-type (e. g. Dataset.__attr_import_dv_up_citation_fields_values).

Resources