Developer Interface

This part of the documentation covers all the interfaces of pyDataverse. For parts where pyDataverse depends on external libraries, we document the most important right here and provide links to the canonical documentation.

Api Interface

Find out more at https://github.com/AUSSDA/pyDataverse.

class Api(base_url, api_token=None, api_version='v1')[source]

API class.

Parameters:
  • base_url (string) – Base URL of Dataverse instance. Without trailing / at the end. e.g. http://demo.dataverse.org
  • api_token (string) – Authenication token for the api.
  • api_version (string) – Dataverse api version. Defaults to v1.
conn_started

datetime – Time when Api() was instantiated, the connection got established.

native_api_base_url

string – Url of Dataverse’s native Api.

base_url
api_token
api_version

Example

Create an Api connection:

>>> base_url = 'http://demo.dataverse.org'
>>> api = Api(base_url)
>>> api.status
'OK'
create_dataset(dataverse, metadata, auth=True)[source]

Add dataset to a dataverse.

http://guides.dataverse.org/en/latest/api/native-api.html#create-a-dataset-in-a-dataverse

POST http://$SERVER/api/dataverses/$dataverse/datasets –upload-file
FILENAME

curl -H “X-Dataverse-key: $API_TOKEN” -X POST $SERVER_URL/api/ dataverses/$DV_ALIAS/datasets/:import?pid=$PERSISTENT_IDENTIFIER& release=yes –upload-file dataset.json curl -H “X-Dataverse-key: $API_TOKEN” -X POST $SERVER_URL/api/ dataverses/$DV_ALIAS/datasets –upload-file dataset-finch1.json

To create a dataset, you must create a JSON file containing all the metadata you want such as in this example file: dataset-finch1.json. Then, you must decide which dataverse to create the dataset in and target that datavese with either the “alias” of the dataverse (e.g. “root” or the database id of the dataverse (e.g. “1”). The initial version state will be set to DRAFT: http://guides.dataverse.org/en/latest/_downloads/dataset-finch1.json

resp.status_code:
201: dataset created
Parameters:
  • dataverse (string) – Alias of dataverse to which the dataset should be added to.
  • metadata (string) – Metadata of the Dataset as a json-formatted string.
Returns:

Response object of requests library.

Return type:

requests.Response

create_dataverse(identifier, metadata, auth=True, parent=':root')[source]

Create a dataverse.

Generates a new dataverse under identifier. Expects a JSON content describing the dataverse, as in the example below. If identifier is omitted, a root dataverse is created. $id can either be a dataverse id (long) or a dataverse alias (more robust).

POST http://$SERVER/api/dataverses/$id?key=$apiKey

Download the JSON example file and modified to create dataverses to suit your needs. The fields name, alias, and dataverseContacts are required. http://guides.dataverse.org/en/latest/ _downloads/dataverse-complete.json

resp.status_code:
200: dataverse created 201: dataverse created
Parameters:
  • identifier (string) – Can either be a dataverse id (long) or a dataverse alias (more robust).
  • metadata (string) – Metadata of the Dataverse as a json-formatted string.
  • auth (bool) – True if api authorization is necessary. Defaults to True.
  • parent (string) – Parent dataverse, if existing, to which the Dataverse gets attached to. Defaults to :root.
Returns:

Response object of requests library.

Return type:

requests.Response

delete_dataset(identifier, auth=True)[source]

Delete a dataset.

Delete the dataset whose id is passed: DELETE http://$SERVER/api/datasets/$id?key=$apiKey

resp.status_code:
200: dataset deleted
Parameters:identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
Returns:Response object of requests library.
Return type:requests.Response
delete_dataverse(identifier, auth=True)[source]

Delete dataverse by alias or id.

Deletes the dataverse whose ID is given: DELETE http://$SERVER/api/dataverses/$id?key=$apiKey

resp.status_code:
200: dataverse deleted
Parameters:identifier (string) – Can either be a dataverse id (long) or a dataverse alias (more robust).
Returns:Response object of requests library.
Return type:requests.Response
get_datafile(identifier)[source]

Download a datafile via the Dataverse Data Access API.

File ID
GET /api/access/datafile/$id
DOI
GET http://$SERVER/api/access/datafile/ :persistentId/?persistentId=doi:10.5072/FK2/J8SJZB
Parameters:identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
Returns:Response object of requests library.
Return type:requests.Response
get_datafile_bundle(identifier)[source]

Download a datafile in all its formats via the Dataverse Data Access API.

GET /api/access/datafile/bundle/$id

Data Access API calls can now be made using persistent identifiers (in addition to database ids). This is done by passing the constant :persistentId where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name persistentId.

This is a convenience packaging method available for tabular data files. It returns a zipped bundle that contains the data in the following formats: - Tab-delimited; - “Saved Original”, the proprietary (SPSS, Stata, R, etc.) file from which the tabular data was ingested; - Generated R Data frame (unless the “original” above was in R); - Data (Variable) metadata record, in DDI XML; - File citation, in Endnote and RIS formats.

Parameters:identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
Returns:Response object of requests library.
Return type:requests.Response
get_datafiles(doi, version='1')[source]

List metadata of all datafiles of a dataset.

http://guides.dataverse.org/en/latest/api/native-api.html#list-files-in-a-dataset GET http://$SERVER/api/datasets/$id/versions/$versionId/ files?key=$apiKey

Parameters:
  • doi (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
  • version (string) – Version of dataset. Defaults to 1.
Returns:

Response object of requests library.

Return type:

requests.Response

get_dataset(identifier, auth=True, is_doi=True)[source]

Get metadata of dataset.

With Dataverse identifier:
GET http://$SERVER/api/datasets/$identifier
With PID:
GET http://$SERVER/api/datasets/:persistentId/?persistentId=$ID GET http://$SERVER/api/datasets/:persistentId/ ?persistentId=doi:10.5072/FK2/J8SJZB
Parameters:
  • identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
  • is_doi (bool) – Is the identifier a Doi? Defauls to True. So far, the module only supports Doi’s as PID’s.
Returns:

Response object of requests library.

Return type:

requests.Response

get_dataset_export(identifier, export_format)[source]

Get metadata of dataset exported in different formats.

CORS Export the metadata of the current published version of a dataset in various formats:

GET http://$SERVER/api/datasets/ export?exporter=ddi&persistentId=$persistentId

Parameters:
  • identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
  • export_format (string) – Export format as a string. Formats: ‘ddi’, ‘oai_ddi’, ‘dcterms’, ‘oai_dc’, ‘schema.org’, ‘dataverse_json’.
Returns:

Response object of requests library.

Return type:

requests.Response

get_dataverse(identifier, auth=False)[source]

Get dataverse metadata by alias or id.

View data about the dataverse $identified by identifier. Identifier can be the id number of the dataverse, its alias, or the special value :root.

GET http://$SERVER/api/dataverses/$id

Parameters:identifier (string) – Can either be a dataverse id (long) or a dataverse alias (more robust).
Returns:Response object of requests library.
Return type:requests.Response
get_info_apiTermsOfUse()[source]

Get API Terms of Use url.

The response contains the text value inserted as API Terms of use which uses the database setting :ApiTermsOfUse.

GET http://$SERVER/api/info/apiTermsOfUse

Returns:Response object of requests library.
Return type:requests.Response
get_info_server()[source]

Get dataverse server name.

This is useful when a Dataverse system is composed of multiple Java EE servers behind a load balancer.

GET http://$SERVER/api/info/server

Returns:Response object of requests library.
Return type:requests.Response
get_info_version()[source]

Get the Dataverse version and build number.

The response contains the version and build numbers.

Requires no api_token GET http://$SERVER/api/info/version

Returns:Response object of requests library.
Return type:requests.Response
get_metadatablock(identifier)[source]

Get info about single metadata block.

Returns data about the block whose identifier is passed. identifier can either be the block’s id, or its name.

GET http://$SERVER/api/metadatablocks/$identifier

Parameters:identifier (string) – Can be block’s id, or it’s name.
Returns:Response object of requests library.
Return type:requests.Response
get_metadatablocks()[source]

Get info about all metadata blocks.

Lists brief info about all metadata blocks registered in the system.

GET http://$SERVER/api/metadatablocks

Returns:Response object of requests library.
Return type:requests.Response
make_delete_request(query_str, auth=False, params=None)[source]

Make a DELETE request.

Parameters:
  • query_str (string) – Query string for the request. Will be concatenated to native_api_base_url.
  • auth (bool) – Should an api token be sent in the request. Defaults to False.
  • params (dict) – Dictionary of parameters to be passed with the request. Defaults to None.
Returns:

Response object of requests library.

Return type:

requests.Response

make_get_request(query_str, params=None, auth=False)[source]

Make a GET request.

Parameters:
  • query_str (string) – Query string for the request. Will be concatenated to native_api_base_url.
  • params (dict) – Dictionary of parameters to be passed with the request. Defaults to None.
  • auth (bool) – Should an api token be sent in the request. Defaults to False.
Returns:

Response object of requests library.

Return type:

requests.Response

make_post_request(query_str, metadata=None, auth=False, params=None)[source]

Make a POST request.

Parameters:
  • query_str (string) – Query string for the request. Will be concatenated to native_api_base_url.
  • metadata (string) – Metadata as a json-formatted string. Defaults to None.
  • auth (bool) – Should an api token be sent in the request. Defaults to False.
  • params (dict) – Dictionary of parameters to be passed with the request. Defaults to None.
Returns:

Response object of requests library.

Return type:

requests.Response

publish_dataset(identifier, type='minor', auth=True)[source]

Publish dataset.

Publishes the dataset whose id is passed. If this is the first version of the dataset, its version number will be set to 1.0. Otherwise, the new dataset version number is determined by the most recent version number and the type parameter. Passing type=minor increases the minor version number (2.3 is updated to 2.4). Passing type=major increases the major version number (2.3 is updated to 3.0). Superusers can pass type=updatecurrent to update metadata without changing the version number.

POST http://$SERVER/api/datasets/$id/actions/:publish?type=$type

When there are no default workflows, a successful publication process will result in 200 OK response. When there are workflows, it is impossible for Dataverse to know how long they are going to take and whether they will succeed or not (recall that some stages might require human intervention). Thus, a 202 ACCEPTED is returned immediately. To know whether the publication process succeeded or not, the client code has to check the status of the dataset periodically, or perform some push request in the post-publish workflow.

resp.status_code:
200: dataset published
Parameters:
  • identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
  • type (string) – Passing minor increases the minor version number (2.3 is updated to 2.4). Passing major increases the major version number (2.3 is updated to 3.0). Superusers can pass updatecurrent to update metadata without changing the version number:
  • auth (bool) – True if api authorization is necessary. Defaults to False.
Returns:

Response object of requests library.

Return type:

requests.Response

publish_dataverse(identifier, auth=True)[source]

Publish a dataverse.

Publish the Dataverse pointed by identifier, which can either by the dataverse alias or its numerical id.

POST http://$SERVER/api/dataverses/$identifier/actions/:publish

resp.status_code:
200: dataverse published
Parameters:
  • identifier (string) – Can either be a dataverse id (long) or a dataverse alias (more robust).
  • auth (bool) – True if api authorization is necessary. Defaults to False.
Returns:

Response object of requests library.

Return type:

requests.Response

upload_file(identifier, filename)[source]

Add file to a dataset.

Add a file to an existing Dataset. Description and tags are optional: POST http://$SERVER/api/datasets/$id/add?key=$apiKey

The upload endpoint checks the content of the file, compares it with existing files and tells if already in the database (most likely via hashing)

Parameters:
  • identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
  • filename (string) – Full filename with path.
Returns:

The json string responded by the CURL request, converted to a dict().

Return type:

dict

Utils Interface

Find out more at https://github.com/AUSSDA/pyDataverse.

dict_to_json(data)[source]

Convert dict() to JSON-formatted string.

See more about the json module at https://docs.python.org/3.5/library/json.html

Parameters:data (dict) – Data as Python Dictionary.
Returns:Data as a json-formatted string.
Return type:string
json_to_dict(data)[source]

Convert JSON to a dict().

See more about the json module at https://docs.python.org/3.5/library/json.html

Parameters:data (string) – Data as a json-formatted string.
Returns:Data as Python Dictionary.
Return type:dict
read_file(filename, mode='r')[source]

Read in a file.

Parameters:
Returns:

Returns data as string.

Return type:

string

read_file_json(filename)[source]

Read in a json file.

See more about the json module at https://docs.python.org/3.5/library/json.html

Parameters:filename (string) – Filename with full path.
Returns:Data as a json-formatted string.
Return type:dict
write_file(filename, data, mode='w')[source]

Write data in a file.

Parameters:
write_file_json(filename, data, mode='w')[source]

Write data to a json file.

Parameters:

Exceptions

Find out more at https://github.com/AUSSDA/pyDataverse.

exception ApiAuthorizationError[source]

Raised if a user provides invalid credentials.

exception ApiResponseError[source]

Raised when the requests response fails.

exception ApiUrlError[source]

Raised when the request url is not valid.

exception DatafileNotFoundError[source]

Raised when a Datafile cannot be found.

exception DatasetNotFoundError[source]

Raised when a Dataset cannot be found.

exception DataverseApiError[source]

Base exception class for Dataverse-related api error.

exception DataverseError[source]

Base exception class for Dataverse-related error.

exception DataverseNotEmptyError[source]

Raised when a Dataverse has accessioned Datasets.

exception DataverseNotFoundError[source]

Raised when a Dataverse cannot be found.

exception OperationFailedError[source]

Raised when an operation fails for an unknown reason.

Install

Install from the local git repository, with all it’s dependencies:

virtualenv venv
source venv/bin/activate
pip install -r tools/tests-requirements.txt
pip install -r tools/lint-requirements.txt
pip install -r tools/docs-requirements.txt
pip install -r tools/packaging-requirements.txt
pip install -e .

Testing

Before you can execute tests, you need a Dataverse account with an api token on a working Dataverse instance. We recommend to use demo.dataverse.org, but you also can use your own instance or any other, but beware: To use a production instance can cause problems.

Before you can run the tests, you have to set the ENV variables for the Dataverse Api connection. This can be done via creation of a pytest.ini file:

[pytest]
env =
    API_TOKEN=**SECRET**
    DATAVERSE_VERSION=4.14
    BASE_URL=https://demo.dataverse.org/

or define them manually in the terminal:

export API_TOKEN=**SECRET**
export DATAVERSE_VERSION=4.14
export BASE_URL=https://demo.dataverse.org/

To run through all tests (e. g. different python versions, packaging, docs, flake8, etc.), simply call tox from the root directory:

tox

When you only want to run one test, e.g. the py36 test:

tox -e py36

To find out more about which tests are available, have a look inside the tox.ini file.

Documentation

Create Sphinx Docs

Use Sphinx to create class and function documentation out of the doc-strings. You can call it via tox. This creates the created docs inside docs/build.

tox -e docs

Create Coverage Reports

Run tests with coverage to create html and xml reports as an output. Again, call it via tox. This creates the created docs inside docs/coverage_html/.

tox -e coverage

Run Coveralls

To use Coveralls on local development:

tox -e coveralls