Use-Cases

For a basic introduction to pyDataverse, visit User Guide - Basic Usage. For information on more advanced uses, visit User Guide - Advanced Usage.

Data Migration

Importing lots of data from data sources outside a Dataverse installation can be done with the help of the CSV templates. Simply add your data to the CSV files, import the files into pyDataverse, and then upload the data and metadata via the API.

The following mappings currently exist:

  • CSV - CSV 2 pyDataverse (Tutorial) - pyDataverse 2 CSV (Tutorial)
  • Dataverse Upload JSON
    • JSON 2 pyDataverse
    • pyDataverse to JSON

If you would like to add an additional mapping, we welcome contributions!

Testing

Create test data for integrity tests (DevOps)

Get full lists of all Dataverse collections, Datasets and Datafiles of an installation, or a subset of it. The results are stored in JSON files, which then can be used to do data integrity tests and verify data completeness. This is typically useful after an upgrade or a Dataverse migration. The data integrates easily into aussda_tests and to any CI build tools.

The general steps for use:

Mass removal of data in a Dataverse installation (DevOps)

After testing, you often have to clean up Dataverse collections with Datasets and Datafiles within. It can be tricky to remove them all at once, but pyDataverse helps you to do it with only a few commands:

This functionality is not yet fully implemented in pyDataverse, but you can find it in aussda_tests.

Data Science Pipeline

Using APIs, you can access data and/or metadata from a Dataverse installation. You can also use pyDataverse to automatically add data and metadata to your Dataset. PyDataverse connects your Data Science pipeline with your Dataverse installation.

Web-Applications / Microservices

As it is a direct and easy way to access Dataverses API’s and to manipulate the Dataverse installation’s data models, it integrates really well into all kind of web-applications and microservices. For example, you can use pyDataverse to visualize data, do some analysis, enrich it with other data sources (and so on).