Developer Interface¶
This part of the documentation covers all the interfaces of pyDataverse. For parts where pyDataverse depends on external libraries, we document the most important right here and provide links to the canonical documentation.
Api Interface¶
Find out more at https://github.com/AUSSDA/pyDataverse.
-
class
Api(base_url, api_token=None, api_version='v1')[source]¶ API class.
Parameters: - base_url (string) – Base URL of Dataverse instance. Without trailing / at the end. e.g. http://demo.dataverse.org
- api_token (string) – Authenication token for the api.
- api_version (string) – Dataverse api version. Defaults to v1.
-
conn_started¶ datetime – Time when Api() was instantiated, the connection got established.
-
native_api_base_url¶ string – Url of Dataverse’s native Api.
-
base_url¶
-
api_token¶
-
api_version¶
Example
Create an Api connection:
>>> base_url = 'http://demo.dataverse.org' >>> api = Api(base_url) >>> api.status 'OK'
-
create_dataset(dataverse, metadata, auth=True)[source]¶ Add dataset to a dataverse.
http://guides.dataverse.org/en/latest/api/native-api.html#create-a-dataset-in-a-dataverse
- POST http://$SERVER/api/dataverses/$dataverse/datasets –upload-file
- FILENAME
curl -H “X-Dataverse-key: $API_TOKEN” -X POST $SERVER_URL/api/ dataverses/$DV_ALIAS/datasets/:import?pid=$PERSISTENT_IDENTIFIER& release=yes –upload-file dataset.json curl -H “X-Dataverse-key: $API_TOKEN” -X POST $SERVER_URL/api/ dataverses/$DV_ALIAS/datasets –upload-file dataset-finch1.json
To create a dataset, you must create a JSON file containing all the metadata you want such as in this example file: dataset-finch1.json. Then, you must decide which dataverse to create the dataset in and target that datavese with either the “alias” of the dataverse (e.g. “root” or the database id of the dataverse (e.g. “1”). The initial version state will be set to DRAFT: http://guides.dataverse.org/en/latest/_downloads/dataset-finch1.json
- resp.status_code:
- 201: dataset created
Parameters: - dataverse (string) – Alias of dataverse to which the dataset should be added to.
- metadata (string) – Metadata of the Dataset as a json-formatted string.
Returns: Response object of requests library.
Return type: requests.Response
-
create_dataverse(identifier, metadata, auth=True, parent=':root')[source]¶ Create a dataverse.
Generates a new dataverse under identifier. Expects a JSON content describing the dataverse, as in the example below. If identifier is omitted, a root dataverse is created. $id can either be a dataverse id (long) or a dataverse alias (more robust).
POST http://$SERVER/api/dataverses/$id?key=$apiKey
Download the JSON example file and modified to create dataverses to suit your needs. The fields name, alias, and dataverseContacts are required. http://guides.dataverse.org/en/latest/ _downloads/dataverse-complete.json
- resp.status_code:
- 200: dataverse created 201: dataverse created
Parameters: - identifier (string) – Can either be a dataverse id (long) or a dataverse alias (more robust).
- metadata (string) – Metadata of the Dataverse as a json-formatted string.
- auth (bool) – True if api authorization is necessary. Defaults to True.
- parent (string) – Parent dataverse, if existing, to which the Dataverse gets attached to. Defaults to :root.
Returns: Response object of requests library.
Return type: requests.Response
-
delete_dataset(identifier, auth=True)[source]¶ Delete a dataset.
Delete the dataset whose id is passed: DELETE http://$SERVER/api/datasets/$id?key=$apiKey
- resp.status_code:
- 200: dataset deleted
Parameters: identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93. Returns: Response object of requests library. Return type: requests.Response
-
delete_dataverse(identifier, auth=True)[source]¶ Delete dataverse by alias or id.
Deletes the dataverse whose ID is given: DELETE http://$SERVER/api/dataverses/$id?key=$apiKey
- resp.status_code:
- 200: dataverse deleted
Parameters: identifier (string) – Can either be a dataverse id (long) or a dataverse alias (more robust). Returns: Response object of requests library. Return type: requests.Response
-
get_datafile(identifier)[source]¶ Download a datafile via the Dataverse Data Access API.
- File ID
- GET /api/access/datafile/$id
- DOI
- GET http://$SERVER/api/access/datafile/ :persistentId/?persistentId=doi:10.5072/FK2/J8SJZB
Parameters: identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93. Returns: Response object of requests library. Return type: requests.Response
-
get_datafile_bundle(identifier)[source]¶ Download a datafile in all its formats via the Dataverse Data Access API.
GET /api/access/datafile/bundle/$id
Data Access API calls can now be made using persistent identifiers (in addition to database ids). This is done by passing the constant :persistentId where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name persistentId.
This is a convenience packaging method available for tabular data files. It returns a zipped bundle that contains the data in the following formats: - Tab-delimited; - “Saved Original”, the proprietary (SPSS, Stata, R, etc.) file from which the tabular data was ingested; - Generated R Data frame (unless the “original” above was in R); - Data (Variable) metadata record, in DDI XML; - File citation, in Endnote and RIS formats.
Parameters: identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93. Returns: Response object of requests library. Return type: requests.Response
-
get_datafiles(doi, version='1')[source]¶ List metadata of all datafiles of a dataset.
http://guides.dataverse.org/en/latest/api/native-api.html#list-files-in-a-dataset GET http://$SERVER/api/datasets/$id/versions/$versionId/ files?key=$apiKey
Parameters: - doi (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
- version (string) – Version of dataset. Defaults to 1.
Returns: Response object of requests library.
Return type: requests.Response
-
get_dataset(identifier, auth=True, is_doi=True)[source]¶ Get metadata of dataset.
- With Dataverse identifier:
- GET http://$SERVER/api/datasets/$identifier
- With PID:
- GET http://$SERVER/api/datasets/:persistentId/?persistentId=$ID GET http://$SERVER/api/datasets/:persistentId/ ?persistentId=doi:10.5072/FK2/J8SJZB
Parameters: - identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
- is_doi (bool) – Is the identifier a Doi? Defauls to True. So far, the module only supports Doi’s as PID’s.
Returns: Response object of requests library.
Return type: requests.Response
-
get_dataset_export(identifier, export_format)[source]¶ Get metadata of dataset exported in different formats.
CORS Export the metadata of the current published version of a dataset in various formats:
GET http://$SERVER/api/datasets/ export?exporter=ddi&persistentId=$persistentId
Parameters: - identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
- export_format (string) – Export format as a string. Formats: ‘ddi’, ‘oai_ddi’, ‘dcterms’, ‘oai_dc’, ‘schema.org’, ‘dataverse_json’.
Returns: Response object of requests library.
Return type: requests.Response
-
get_dataverse(identifier, auth=False)[source]¶ Get dataverse metadata by alias or id.
View data about the dataverse $identified by identifier. Identifier can be the id number of the dataverse, its alias, or the special value :root.
GET http://$SERVER/api/dataverses/$id
Parameters: identifier (string) – Can either be a dataverse id (long) or a dataverse alias (more robust). Returns: Response object of requests library. Return type: requests.Response
-
get_info_apiTermsOfUse()[source]¶ Get API Terms of Use url.
The response contains the text value inserted as API Terms of use which uses the database setting :ApiTermsOfUse.
GET http://$SERVER/api/info/apiTermsOfUse
Returns: Response object of requests library. Return type: requests.Response
-
get_info_server()[source]¶ Get dataverse server name.
This is useful when a Dataverse system is composed of multiple Java EE servers behind a load balancer.
GET http://$SERVER/api/info/server
Returns: Response object of requests library. Return type: requests.Response
-
get_info_version()[source]¶ Get the Dataverse version and build number.
The response contains the version and build numbers.
Requires no api_token GET http://$SERVER/api/info/version
Returns: Response object of requests library. Return type: requests.Response
-
get_metadatablock(identifier)[source]¶ Get info about single metadata block.
Returns data about the block whose identifier is passed. identifier can either be the block’s id, or its name.
GET http://$SERVER/api/metadatablocks/$identifier
Parameters: identifier (string) – Can be block’s id, or it’s name. Returns: Response object of requests library. Return type: requests.Response
-
get_metadatablocks()[source]¶ Get info about all metadata blocks.
Lists brief info about all metadata blocks registered in the system.
GET http://$SERVER/api/metadatablocks
Returns: Response object of requests library. Return type: requests.Response
-
make_delete_request(query_str, auth=False, params=None)[source]¶ Make a DELETE request.
Parameters: - query_str (string) – Query string for the request. Will be concatenated to native_api_base_url.
- auth (bool) – Should an api token be sent in the request. Defaults to False.
- params (dict) – Dictionary of parameters to be passed with the request. Defaults to None.
Returns: Response object of requests library.
Return type: requests.Response
-
make_get_request(query_str, params=None, auth=False)[source]¶ Make a GET request.
Parameters: - query_str (string) – Query string for the request. Will be concatenated to native_api_base_url.
- params (dict) – Dictionary of parameters to be passed with the request. Defaults to None.
- auth (bool) – Should an api token be sent in the request. Defaults to False.
Returns: Response object of requests library.
Return type: requests.Response
-
make_post_request(query_str, metadata=None, auth=False, params=None)[source]¶ Make a POST request.
Parameters: - query_str (string) – Query string for the request. Will be concatenated to native_api_base_url.
- metadata (string) – Metadata as a json-formatted string. Defaults to None.
- auth (bool) – Should an api token be sent in the request. Defaults to False.
- params (dict) – Dictionary of parameters to be passed with the request. Defaults to None.
Returns: Response object of requests library.
Return type: requests.Response
-
publish_dataset(identifier, type='minor', auth=True)[source]¶ Publish dataset.
Publishes the dataset whose id is passed. If this is the first version of the dataset, its version number will be set to 1.0. Otherwise, the new dataset version number is determined by the most recent version number and the type parameter. Passing type=minor increases the minor version number (2.3 is updated to 2.4). Passing type=major increases the major version number (2.3 is updated to 3.0). Superusers can pass type=updatecurrent to update metadata without changing the version number.
POST http://$SERVER/api/datasets/$id/actions/:publish?type=$type
When there are no default workflows, a successful publication process will result in 200 OK response. When there are workflows, it is impossible for Dataverse to know how long they are going to take and whether they will succeed or not (recall that some stages might require human intervention). Thus, a 202 ACCEPTED is returned immediately. To know whether the publication process succeeded or not, the client code has to check the status of the dataset periodically, or perform some push request in the post-publish workflow.
- resp.status_code:
- 200: dataset published
Parameters: - identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
- type (string) – Passing minor increases the minor version number (2.3 is updated to 2.4). Passing major increases the major version number (2.3 is updated to 3.0). Superusers can pass updatecurrent to update metadata without changing the version number:
- auth (bool) – True if api authorization is necessary. Defaults to False.
Returns: Response object of requests library.
Return type: requests.Response
-
publish_dataverse(identifier, auth=True)[source]¶ Publish a dataverse.
Publish the Dataverse pointed by identifier, which can either by the dataverse alias or its numerical id.
POST http://$SERVER/api/dataverses/$identifier/actions/:publish
- resp.status_code:
- 200: dataverse published
Parameters: - identifier (string) – Can either be a dataverse id (long) or a dataverse alias (more robust).
- auth (bool) – True if api authorization is necessary. Defaults to False.
Returns: Response object of requests library.
Return type: requests.Response
-
upload_file(identifier, filename)[source]¶ Add file to a dataset.
Add a file to an existing Dataset. Description and tags are optional: POST http://$SERVER/api/datasets/$id/add?key=$apiKey
The upload endpoint checks the content of the file, compares it with existing files and tells if already in the database (most likely via hashing)
Parameters: - identifier (string) – Doi of the dataset. e.g. doi:10.11587/8H3N93.
- filename (string) – Full filename with path.
Returns: The json string responded by the CURL request, converted to a dict().
Return type: dict
Utils Interface¶
Find out more at https://github.com/AUSSDA/pyDataverse.
-
dict_to_json(data)[source]¶ Convert dict() to JSON-formatted string.
See more about the json module at https://docs.python.org/3.5/library/json.html
Parameters: data (dict) – Data as Python Dictionary. Returns: Data as a json-formatted string. Return type: string
-
json_to_dict(data)[source]¶ Convert JSON to a dict().
See more about the json module at https://docs.python.org/3.5/library/json.html
Parameters: data (string) – Data as a json-formatted string. Returns: Data as Python Dictionary. Return type: dict
-
read_file(filename, mode='r')[source]¶ Read in a file.
Parameters: - filename (string) – Filename with full path.
- mode (string) – Read mode of file. Defaults to r. See more at https://docs.python.org/3.5/library/functions.html#open
Returns: Returns data as string.
Return type: string
-
read_file_json(filename)[source]¶ Read in a json file.
See more about the json module at https://docs.python.org/3.5/library/json.html
Parameters: filename (string) – Filename with full path. Returns: Data as a json-formatted string. Return type: dict
-
write_file(filename, data, mode='w')[source]¶ Write data in a file.
Parameters: - filename (string) – Filename with full path.
- data (string) – Data to be stored.
- mode (string) – Read mode of file. Defaults to w. See more at https://docs.python.org/3.5/library/functions.html#open
-
write_file_json(filename, data, mode='w')[source]¶ Write data to a json file.
Parameters: - filename (string) – Filename with full path.
- data (dict) – Data to be written in the json file.
- mode (string) – Write mode of file. Defaults to w. See more at https://docs.python.org/3/library/functions.html#open
Exceptions¶
Find out more at https://github.com/AUSSDA/pyDataverse.
Install¶
Install from the local git repository, with all it’s dependencies:
virtualenv venv
source venv/bin/activate
pip install -r tools/tests-requirements.txt
pip install -r tools/lint-requirements.txt
pip install -r tools/docs-requirements.txt
pip install -r tools/packaging-requirements.txt
pip install -e .
Testing¶
Before you can execute tests, you need a Dataverse account with an api token on a working Dataverse instance. We recommend to use demo.dataverse.org, but you also can use your own instance or any other, but beware: To use a production instance can cause problems.
Before you can run the tests, you have to set the ENV variables for the Dataverse Api connection. This can be done via creation of a pytest.ini file:
[pytest]
env =
API_TOKEN=**SECRET**
DATAVERSE_VERSION=4.14
BASE_URL=https://demo.dataverse.org/
or define them manually in the terminal:
export API_TOKEN=**SECRET**
export DATAVERSE_VERSION=4.14
export BASE_URL=https://demo.dataverse.org/
To run through all tests (e. g. different python versions, packaging, docs, flake8, etc.), simply call tox from the root directory:
tox
When you only want to run one test, e.g. the py36 test:
tox -e py36
To find out more about which tests are available, have a look inside the tox.ini file.
Documentation¶
Create Sphinx Docs
Use Sphinx to create class and function documentation out of the doc-strings. You can call it via tox. This creates the created docs inside docs/build.
tox -e docs
Create Coverage Reports
Run tests with coverage to create html and xml reports as an output. Again, call it via tox. This creates the created docs inside docs/coverage_html/.
tox -e coverage
Run Coveralls
To use Coveralls on local development:
tox -e coveralls