Reference / API¶
This part of the documentation covers all the interfaces / APIs of the pyDataverse modules.
Where pyDataverse depends on external libraries, we document the most important right here and provide links to the canonical documentation outside of scope.
API Interface¶
Access all of Dataverse APIs.
Dataverse API wrapper for all it’s API’s.
-
class
Api
(base_url: str, api_token: str = None, api_version: str = 'latest')[source]¶ Base class.
Parameters: -
base_url
¶
-
api_token
¶
-
dataverse_version
¶
-
delete_request
(url, auth=False, params=None)[source]¶ Make a Delete request.
Parameters: Returns: Response object of requests library.
Return type:
-
get_request
(url, params=None, auth=False)[source]¶ Make a GET request.
Parameters: Returns: class – Response object of requests library.
Return type: requests.Response
-
post_request
(url, data=None, auth=False, params=None, files=None)[source]¶ Make a POST request.
params will be added as key-value pairs to the URL.
Parameters: - url (str) – Full URL.
- data (str) – Metadata as a json-formatted string. Defaults to None.
- auth (bool) – Should an api token be sent in the request. Defaults to False.
- files (dict) –
- files = {‘file’: open(‘sample_file.txt’,’rb’)}
- params (dict) – Dictionary of parameters to be passed with the request. Defaults to None.
Returns: Response object of requests library.
Return type:
-
-
class
DataAccessApi
(base_url, api_token=None)[source]¶ Class to access Dataverse’s Data Access API.
Examples
Examples should be written in doctest format, and should illustrate how to use the function/class. >>>
-
base_url_api_data_access
¶ type – Description of attribute base_url_api_data_access.
-
base_url
¶ type – Description of attribute base_url.
-
allow_access_request
(identifier, do_allow=True, auth=True, is_pid=True)[source]¶ Allow access request for datafiles.
https://guides.dataverse.org/en/latest/api/dataaccess.html#allow-access-requests
curl -H “X-Dataverse-key:$API_TOKEN” -X PUT -d true http://$SERVER/api/access/{id}/allowAccessRequest curl -H “X-Dataverse-key:$API_TOKEN” -X PUT -d true http://$SERVER/api/access/:persistentId/allowAccessRequest?persistentId={pid}
-
get_datafile
(identifier, data_format=None, no_var_header=None, image_thumb=None, is_pid=True, auth=False)[source]¶ Download a datafile via the Dataverse Data Access API.
Get by file id (HTTP Request).
GET /api/access/datafile/$id
Get by persistent identifier (HTTP Request).
GET http://$SERVER/api/access/datafile/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB
Parameters: Returns: Response object of requests library.
Return type:
-
get_datafile_bundle
(identifier, file_metadata_id=None, auth=False)[source]¶ Download a datafile in all its formats.
HTTP Request:
GET /api/access/datafile/bundle/$id
Data Access API calls can now be made using persistent identifiers (in addition to database ids). This is done by passing the constant :persistentId where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name persistentId.
This is a convenience packaging method available for tabular data files. It returns a zipped bundle that contains the data in the following formats: - Tab-delimited; - “Saved Original”, the proprietary (SPSS, Stata, R, etc.) file from which the tabular data was ingested; - Generated R Data frame (unless the “original” above was in R); - Data (Variable) metadata record, in DDI XML; - File citation, in Endnote and RIS formats.
Parameters: identifier (str) – Identifier of the dataset. Returns: Response object of requests library. Return type: requests.Response
-
get_datafiles
(identifier, data_format=None, auth=False)[source]¶ Download a datafile via the Dataverse Data Access API.
Get by file id (HTTP Request).
GET /api/access/datafiles/$id1,$id2,...$idN
Get by persistent identifier (HTTP Request).
Parameters: identifier (str) – Identifier of the dataset. Can be datafile id or persistent identifier of the datafile (e. g. doi). Returns: Response object of requests library. Return type: requests.Response
-
grant_file_access
(identifier, user, auth=False)[source]¶ Grant datafile access.
https://guides.dataverse.org/en/4.18.1/api/dataaccess.html#grant-file-access
curl -H “X-Dataverse-key:$API_TOKEN” -X PUT http://$SERVER/api/access/datafile/{id}/grantAccess/{@userIdentifier}
-
list_file_access_requests
(identifier, auth=False)[source]¶ Liste datafile access requests.
https://guides.dataverse.org/en/4.18.1/api/dataaccess.html#list-file-access-requests
curl -H “X-Dataverse-key:$API_TOKEN” -X GET http://$SERVER/api/access/datafile/{id}/listRequests
-
request_access
(identifier, auth=True, is_filepid=False)[source]¶ Request datafile access.
This method requests access to the datafile whose id is passed on the behalf of an authenticated user whose key is passed. Note that not all datasets allow access requests to restricted files.
https://guides.dataverse.org/en/4.18.1/api/dataaccess.html#request-access
/api/access/datafile/$id/requestAccess
curl -H “X-Dataverse-key:$API_TOKEN” -X PUT http://$SERVER/api/access/datafile/{id}/requestAccess
-
-
class
MetricsApi
(base_url, api_token=None, api_version='latest')[source]¶ Class to access Dataverse’s Metrics API.
-
base_url_api_metrics
¶ type – Description of attribute base_url_api_metrics.
-
base_url
¶ type – Description of attribute base_url.
-
get_datasets_by_data_location
(data_location, auth=False)[source]¶ GET https://$SERVER/api/info/metrics/datasets/bySubject
$type can be set to dataverses, datasets, files or downloads.
-
get_datasets_by_subject
(date_str=None, auth=False)[source]¶ GET https://$SERVER/api/info/metrics/datasets/bySubject
$type can be set to dataverses, datasets, files or downloads.
-
get_dataverses_by_category
(auth=False)[source]¶ GET https://$SERVER/api/info/metrics/dataverses/byCategory
$type can be set to dataverses, datasets, files or downloads.
-
get_dataverses_by_subject
(auth=False)[source]¶ GET https://$SERVER/api/info/metrics/dataverses/bySubject
$type can be set to dataverses, datasets, files or downloads.
-
past_days
(data_type, days_str, auth=False)[source]¶ http://guides.dataverse.org/en/4.18.1/api/metrics.html GET https://$SERVER/api/info/metrics/$type/pastDays/$days
$type can be set to dataverses, datasets, files or downloads.
-
total
(data_type, date_str=None, auth=False)[source]¶ GET https://$SERVER/api/info/metrics/$type GET https://$SERVER/api/info/metrics/$type/toMonth/$YYYY-DD
$type can be set to dataverses, datasets, files or downloads.
-
-
class
NativeApi
(base_url: str, api_token=None, api_version='v1')[source]¶ Class to access Dataverse’s Native API.
Parameters: -
base_url_api_native
¶ type – Description of attribute base_url_api_native.
-
base_url_api
¶ type – Description of attribute base_url_api.
-
create_dataset
(dataverse, metadata, pid=None, publish=False, auth=True)[source]¶ Add dataset to a dataverse.
HTTP Request:
POST http://$SERVER/api/dataverses/$dataverse/datasets --upload-file FILENAME
Add new dataset with curl:
curl -H "X-Dataverse-key: $API_TOKEN" -X POST $SERVER_URL/api/dataverses/$DV_ALIAS/datasets --upload-file tests/data/dataset_min.json
Import dataset with existing persistend identifier with curl:
curl -H "X-Dataverse-key: $API_TOKEN" -X POST $SERVER_URL/api/dataverses/$DV_ALIAS/datasets/:import?pid=$PERSISTENT_IDENTIFIER&release=yes --upload-file tests/data/dataset_min.json
To create a dataset, you must create a JSON file containing all the metadata you want such as example file: dataset-finch1.json. Then, you must decide which dataverse to create the dataset in and target that datavese with either the “alias” of the dataverse (e.g. “root”) or the database id of the dataverse (e.g. “1”). The initial version state will be set to “DRAFT”:
- Status Code:
- 201: dataset created
Import Dataset with existing PID: http://guides.dataverse.org/en/latest/api/native-api.html#import-a-dataset-into-a-dataverse To import a dataset with an existing persistent identifier (PID), the dataset’s metadata should be prepared in Dataverse’s native JSON format. The PID is provided as a parameter at the URL. The following line imports a dataset with the PID PERSISTENT_IDENTIFIER to Dataverse, and then releases it:
The pid parameter holds a persistent identifier (such as a DOI or Handle). The import will fail if no PID is provided, or if the provided PID fails validation.
The optional release parameter tells Dataverse to immediately publish the dataset. If the parameter is changed to no, the imported dataset will remain in DRAFT status.
Parameters: - dataverse (str) – “alias” of the dataverse (e.g.
root
) or the database id of the dataverse (e.g.1
) - pid (str) – PID of existing Dataset.
- publish (bool) – Publish only works when a Dataset with an existing PID is created. If it
is
True
, Dataset should be instantly published,False
if a Draft should be created. - metadata (str) –
Metadata of the Dataset as a json-formatted string (e. g. dataset-finch1.json)
Returns: Response object of requests library.
Return type:
-
create_dataset_private_url
(identifier, is_pid=True, auth=True)[source]¶ Create private Dataset URL.
POST http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey
- http://guides.dataverse.org/en/4.16/api/native-api.html#create-a-private-url-for-a-dataset
- ‘MSG: {1}’.format(pid, error_msg))
-
create_dataverse
(parent: str, metadata: str, auth: bool = True) → requests.models.Response[source]¶ Create a dataverse.
Generates a new dataverse under identifier. Expects a JSON content describing the dataverse.
HTTP Request:
POST http://$SERVER/api/dataverses/$id
Download the dataverse.json example file and modify to create dataverses to suit your needs. The fields name, alias, and dataverseContacts are required.
- Status Codes:
- 200: dataverse created 201: dataverse created
Parameters: Returns: Response object of requests library.
Return type:
-
create_role
(dataverse_id)[source]¶ Create a new role.
HTTP Request:
POST http://$SERVER/api/roles?dvo=$dataverseIdtf&key=$apiKey
Parameters: dataverse_id (str) – Can be alias or id of a Dataverse. Returns: Response object of requests library. Return type: requests.Response
-
dataverse_id2alias
(dataverse_id, auth=False)[source]¶ Converts a Dataverse ID to an alias.
Parameters: dataverse_id (str) – Dataverse ID. Returns: Dataverse alias Return type: str
-
delete_dataset
(identifier, is_pid=True, auth=True)[source]¶ Delete a dataset.
Delete the dataset whose id is passed
- Status Code:
- 200: dataset deleted
Parameters: Returns: Response object of requests library.
Return type:
-
delete_dataset_private_url
(identifier, is_pid=True, auth=True)[source]¶ Get private Dataset URL.
DELETE http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey
http://guides.dataverse.org/en/4.16/api/native-api.html#delete-the-private-url-from-a-dataset
-
delete_dataverse
(identifier, auth=True)[source]¶ Delete dataverse by alias or id.
- Status Code:
- 200: Dataverse deleted
Parameters: identifier (str) – Can either be a dataverse id (long) or a dataverse alias (more robust). Returns: Response object of requests library. Return type: requests.Response
-
delete_role
(role_id)[source]¶ Delete role.
Parameters: identifier (str) – Can be alias or id of a Dataverse. Returns: Response object of requests library. Return type: requests.Response
-
delete_user_api_token
()[source]¶ Delete an Users API token.
HTTP Request:
curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/users/token/recreate
Returns: Response object of requests library. Return type: requests.Response
-
destroy_dataset
(identifier, is_pid=True, auth=True)[source]¶ Destroy Dataset.
http://guides.dataverse.org/en/4.16/api/native-api.html#delete-published-dataset
Normally published datasets should not be deleted, but there exists a “destroy” API endpoint for superusers which will act on a dataset given a persistent ID or dataset database ID:
curl -H “X-Dataverse-key:$API_TOKEN” -X DELETE http://$SERVER/api/datasets/:persistentId/destroy/?persistentId=doi:10.5072/FK2/AAA000
curl -H “X-Dataverse-key:$API_TOKEN” -X DELETE http://$SERVER/api/datasets/999/destroy
Calling the destroy endpoint is permanent and irreversible. It will remove the dataset and its datafiles, then re-index the parent dataverse in Solr. This endpoint requires the API token of a superuser.
-
edit_dataset_metadata
(identifier, metadata, is_pid=True, replace=False, auth=True)[source]¶ Edit metadata of a given dataset.
HTTP Request:
PUT http://$SERVER/api/datasets/editMetadata/$id --upload-file FILENAME
Add data to dataset fields that are blank or accept multiple values with the following
CURL Request:
curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/editMetadata/?persistentId=$pid --upload-file dataset-add-metadata.json
For these edits your JSON file need only include those dataset fields which you would like to edit. A sample JSON file may be downloaded here: dataset-edit-metadata-sample.json
Parameters: - identifier (str) – Identifier of the dataset. Can be a Dataverse identifier or a
persistent identifier (e.g.
doi:10.11587/8H3N93
). - metadata (str) – Metadata of the Dataset as a json-formatted string.
- is_pid (bool) –
True
to use persistent identifier.False
, if not. - replace (bool) –
True
to replace already existing metadata.False
, if not. - auth (bool) –
True
, if an api token should be sent. Defaults toFalse
.
Returns: Response object of requests library.
Return type: Examples
Get dataset metadata:
>>> data = api.get_dataset(doi).json()["data"]["latestVersion"]["metadataBlocks"]["citation"] >>> resp = api.edit_dataset_metadata(doi, data, is_replace=True, auth=True) >>> resp.status_code 200: metadata updated
- identifier (str) – Identifier of the dataset. Can be a Dataverse identifier or a
persistent identifier (e.g.
-
get_children
(parent=':root', parent_type='dataverse', children_types=None, auth=True)[source]¶ Walk through children of parent element in Dataverse tree.
Default: gets all child dataverses if parent = dataverse or all
Example Dataverse Tree:
data = { 'type': 'dataverse', 'dataverse_id': 1, 'dataverse_alias': ':root', 'children': [ { 'type': 'datasets', 'dataset_id': 231, 'pid': 'doi:10.11587/LYFDYC', 'children': [ { 'type': 'datafile' 'datafile_id': 532, 'pid': 'doi:10.11587/LYFDYC/C2WTRN', 'filename': '10082_curation.pdf ' } ] } ] }
Parameters: Returns: - list – List of Dataverse data type dictionaries. Different ones for Dataverses, Datasets and Datafiles.
- # TODO
- - differentiate between published and unpublished data types
- - util function to read out all dataverses into a list
- - util function to read out all datasets into a list
- - util function to read out all datafiles into a list
- - Unify tree and models
-
get_datafile_metadata
(identifier, is_filepid=False, is_draft=False, auth=True)[source]¶ GET http://$SERVER/api/files/{id}/metadata
curl $SERVER_URL/api/files/$ID/metadata curl “$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID” curl “https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000” curl -H “X-Dataverse-key:$API_TOKEN” $SERVER_URL/api/files/$ID/metadata/draft
-
get_datafiles_metadata
(pid, version=':latest', auth=True)[source]¶ List metadata of all datafiles of a dataset.
HTTP Request:
GET http://$SERVER/api/datasets/$id/versions/$versionId/files
Parameters: Returns: Response object of requests library.
Return type:
-
get_dataset
(identifier, version=':latest', auth=True, is_pid=True)[source]¶ Get metadata of a Dataset.
With Dataverse identifier:
GET http://$SERVER/api/datasets/$identifier
With persistent identifier:
GET http://$SERVER/api/datasets/:persistentId/?persistentId=$id GET http://$SERVER/api/datasets/:persistentId/ ?persistentId=$pid
Parameters: - identifier (str) – Identifier of the dataset. Can be a Dataverse identifier or a
persistent identifier (e.g.
doi:10.11587/8H3N93
). - is_pid (bool) – True, if identifier is a persistent identifier.
- version (str) – Version to be retrieved:
:latest-published
: the latest published version:latest
: either a draft (if exists) or the latest published version.:draft
: the draft version, if anyx.y
: x.y a specific version, where x is the major version number and y is the minor version number.x
: same as x.0
Returns: Response object of requests library.
Return type: - identifier (str) – Identifier of the dataset. Can be a Dataverse identifier or a
persistent identifier (e.g.
-
get_dataset_export
(pid, export_format, auth=False)[source]¶ Get metadata of dataset exported in different formats.
Export the metadata of the current published version of a dataset in various formats by its persistend identifier.
GET http://$SERVER/api/datasets/export?exporter=$exportformat&persistentId=$pid
Parameters: Returns: Response object of requests library.
Return type:
-
get_dataset_lock
(pid)[source]¶ Get if dataset is locked.
The lock API endpoint was introduced in Dataverse 4.9.3.
Parameters: pid (str) – Persistent identifier of the Dataset (e.g. doi:10.11587/8H3N93
).Returns: Response object of requests library. Return type: requests.Response
-
get_dataset_private_url
(identifier, is_pid=True, auth=True)[source]¶ Get private Dataset URL.
GET http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey
http://guides.dataverse.org/en/4.16/api/native-api.html#get-the-private-url-for-a-dataset
-
get_dataset_version
(identifier, version, auth=True, is_pid=True)[source]¶ Get version of a Dataset.
With Dataverse identifier:
GET http://$SERVER/api/datasets/$identifier/versions/$versionNumber
With persistent identifier:
GET http://$SERVER/api/datasets/:persistentId/versions/$versionNumber?persistentId=$id
Parameters: Returns: Response object of requests library.
Return type:
-
get_dataset_versions
(identifier, auth=True, is_pid=True)[source]¶ Get versions of a Dataset.
With Dataverse identifier:
GET http://$SERVER/api/datasets/$identifier/versions
With persistent identifier:
GET http://$SERVER/api/datasets/:persistentId/versions?persistentId=$id
Parameters: Returns: Response object of requests library.
Return type:
-
get_dataverse
(identifier, auth=False)[source]¶ Get dataverse metadata by alias or id.
View metadata about a dataverse.
GET http://$SERVER/api/dataverses/$id
Parameters: identifier (str) – Can either be a dataverse id (long), a dataverse alias (more robust), or the special value :root
.Returns: Response object of requests library. Return type: requests.Response
-
get_dataverse_assignments
(identifier, auth=False)[source]¶ Get dataverse assignments by alias or id.
View assignments of a dataverse.
GET http://$SERVER/api/dataverses/$id/assignments
Parameters: identifier (str) – Can either be a dataverse id (long), a dataverse alias (more robust), or the special value :root
.Returns: Response object of requests library. Return type: requests.Response
-
get_dataverse_contents
(identifier, auth=True)[source]¶ Gets contents of Dataverse.
Parameters: Returns: Response object of requests library.
Return type:
-
get_dataverse_facets
(identifier, auth=False)[source]¶ Get dataverse facets by alias or id.
View facets of a dataverse.
GET http://$SERVER/api/dataverses/$id/facets
Parameters: identifier (str) – Can either be a dataverse id (long), a dataverse alias (more robust), or the special value :root
.Returns: Response object of requests library. Return type: requests.Response
-
get_dataverse_roles
(identifier: str, auth: bool = False) → requests.models.Response[source]¶ All the roles defined directly in the dataverse by identifier.
GET http://$SERVER/api/dataverses/$id/roles
Parameters: identifier (str) – Can either be a dataverse id (long), a dataverse alias (more robust), or the special value :root
.Returns: Response object of requests library. Return type: requests.Response
-
get_info_api_terms_of_use
(auth=False)[source]¶ Get API Terms of Use url.
The response contains the text value inserted as API Terms of use which uses the database setting :ApiTermsOfUse.
HTTP Request:
GET http://$SERVER/api/info/apiTermsOfUse
Returns: Response object of requests library. Return type: requests.Response
-
get_info_server
(auth=False)[source]¶ Get dataverse server name.
This is useful when a Dataverse system is composed of multiple Java EE servers behind a load balancer.
HTTP Request:
GET http://$SERVER/api/info/server
Returns: Response object of requests library. Return type: requests.Response
-
get_info_version
(auth=False)[source]¶ Get the Dataverse version and build number.
The response contains the version and build numbers. Requires no api token.
HTTP Request:
GET http://$SERVER/api/info/version
Returns: Response object of requests library. Return type: requests.Response
-
get_metadatablock
(identifier, auth=False)[source]¶ Get info about single metadata block.
Returns data about the block whose identifier is passed. identifier can either be the block’s id, or its name.
HTTP Request:
GET http://$SERVER/api/metadatablocks/$identifier
Parameters: identifier (str) – Can be block’s id, or it’s name. Returns: Response object of requests library. Return type: requests.Response
-
get_metadatablocks
(auth=False)[source]¶ Get info about all metadata blocks.
Lists brief info about all metadata blocks registered in the system.
HTTP Request:
GET http://$SERVER/api/metadatablocks
Returns: Response object of requests library. Return type: requests.Response
-
get_user
()[source]¶ Get details of the current authenticated user.
Auth must be
true
for this to work. API endpoint is available for Dataverse >= 5.3.https://guides.dataverse.org/en/latest/api/native-api.html#get-user-information-in-json-format
-
get_user_api_token_expiration_date
(auth=False)[source]¶ Get the expiration date of an Users’s API token.
HTTP Request:
curl -H X-Dataverse-key:$API_TOKEN -X GET $SERVER_URL/api/users/token
Returns: Response object of requests library. Return type: requests.Response
-
publish_dataset
(pid, release_type='minor', auth=True)[source]¶ Publish dataset.
Publishes the dataset whose id is passed. If this is the first version of the dataset, its version number will be set to 1.0. Otherwise, the new dataset version number is determined by the most recent version number and the type parameter. Passing type=minor increases the minor version number (2.3 is updated to 2.4). Passing type=major increases the major version number (2.3 is updated to 3.0). Superusers can pass type=updatecurrent to update metadata without changing the version number.
HTTP Request:
POST http://$SERVER/api/datasets/$id/actions/:publish?type=$type
When there are no default workflows, a successful publication process will result in 200 OK response. When there are workflows, it is impossible for Dataverse to know how long they are going to take and whether they will succeed or not (recall that some stages might require human intervention). Thus, a 202 ACCEPTED is returned immediately. To know whether the publication process succeeded or not, the client code has to check the status of the dataset periodically, or perform some push request in the post-publish workflow.
- Status Code:
- 200: dataset published
Parameters: - pid (str) – Persistent identifier of the dataset (e.g.
doi:10.11587/8H3N93
). - release_type (str) – Passing
minor
increases the minor version number (2.3 is updated to 2.4). Passingmajor
increases the major version number (2.3 is updated to 3.0). Superusers can passupdatecurrent
to update metadata without changing the version number. - auth (bool) –
True
if api authorization is necessary. Defaults toFalse
.
Returns: Response object of requests library.
Return type:
-
publish_dataverse
(identifier, auth=True)[source]¶ Publish a dataverse.
Publish the Dataverse pointed by identifier, which can either by the dataverse alias or its numerical id.
HTTP Request:
POST http://$SERVER/api/dataverses/$identifier/actions/:publish
- Status Code:
- 200: Dataverse published
Parameters: Returns: Response object of requests library.
Return type:
-
recreate_user_api_token
()[source]¶ Recreate an Users API token.
HTTP Request:
curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/users/token/recreate
Returns: Response object of requests library. Return type: requests.Response
-
redetect_file_type
(identifier: str, is_pid: bool = False, dry_run: bool = False) → requests.models.Response[source]¶ Redetect file type.
https://guides.dataverse.org/en/latest/api/native-api.html#redetect-file-type
Parameters: Returns: Request Response() object.
Return type: Response
-
reingest_datafile
(identifier: str, is_pid: bool = False) → requests.models.Response[source]¶ Reingest datafile.
https://guides.dataverse.org/en/latest/api/native-api.html#reingest-a-file
Parameters: Returns: Request Response() object.
Return type: Response
-
replace_datafile
(identifier, filename, json_str, is_filepid=True)[source]¶ Replace datafile.
HTTP Request:
POST -F 'file=@file.extension' -F 'jsonData={json}' http://$SERVER/api/files/{id}/replace?key={apiKey}
Parameters: Returns: The json string responded by the CURL request, converted to a dict().
Return type:
-
restrict_datafile
(identifier: str, is_pid: bool = False) → requests.models.Response[source]¶ Uningest datafile.
https://guides.dataverse.org/en/latest/api/native-api.html#restrict-files
Parameters: Returns: Request Response() object.
Return type: Response
-
show_role
(role_id, auth=False)[source]¶ Show role.
HTTP Request:
GET http://$SERVER/api/roles/$id
Parameters: identifier (str) – Can be alias or id of a Dataverse. Returns: Response object of requests library. Return type: requests.Response
-
uningest_datafile
(identifier: str, is_pid: bool = False) → requests.models.Response[source]¶ Uningest datafile.
https://guides.dataverse.org/en/latest/api/native-api.html#uningest-a-file
Parameters: Returns: Request Response() object.
Return type: Response
-
update_datafile_metadata
(identifier, json_str=None, is_filepid=False)[source]¶ Update datafile metadata.
metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced: Updates the file metadata for an existing file where ID is the database id of the file to update or PERSISTENT_ID is the persistent id (DOI or Handle) of the file. Requires a jsonString expressing the new metadata. No metadata from the previous version of this file will be persisted, so if you want to update a specific field first get the json with the above command and alter the fields you want.
Also note that dataFileTags are not versioned and changes to these will update the published version of the file.
This functions needs CURL to work!
HTTP Request:
POST -F 'file=@file.extension' -F 'jsonData={json}' http://$SERVER/api/files/{id}/metadata?key={apiKey} curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' $SERVER_URL/api/files/$ID/metadata curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000"
Docs.
Parameters: Returns: The json string responded by the CURL request, converted to a dict().
Return type:
-
upload_datafile
(identifier, filename, json_str=None, is_pid=True)[source]¶ Add file to a dataset.
Add a file to an existing Dataset. Description and tags are optional:
HTTP Request:
POST http://$SERVER/api/datasets/$id/add
The upload endpoint checks the content of the file, compares it with existing files and tells if already in the database (most likely via hashing).
Parameters: Returns: The json string responded by the CURL request, converted to a dict().
Return type:
-
-
class
SearchApi
(base_url, api_token=None, api_version='latest')[source]¶ Class to access Dataverse’s Search API.
Examples
Examples should be written in doctest format, and should illustrate how to use the function/class. >>>
-
base_url_api_search
¶ type – Description of attribute base_url_api_search.
-
base_url
¶ type – Description of attribute base_url.
-
-
class
SwordApi
(base_url, api_version='v1.1', api_token=None, sword_api_version='v1.1')[source]¶ Class to access Dataverse’s SWORD API.
Parameters: sword_api_version (str) – SWORD API version. Defaults to ‘v1.1’. -
base_url_api_sword
¶ str – Description of attribute base_url_api_sword.
-
base_url
¶ str – Description of attribute base_url.
-
native_api_version
¶ str – Description of attribute native_api_version.
-
sword_api_version
¶
-
Models Interface¶
Use all metadata models of the Dataverse data-types (Dataverse, Dataset and Datafile). This includes import, export and manipulation.
Dataverse data-types data model.
-
class
DVObject
(data=None)[source]¶ Base class for the Dataverse data types Dataverse, Dataset and Datafile.
-
from_json
(json_str, data_format=None, validate=True, filename_schema=None)[source]¶ Import metadata from a JSON file.
Parses in the metadata from different JSON formats.
Parameters: - json_str (str) – JSON string to be imported.
- data_format (str) – Data formats available for import. See _allowed_json_formats.
- validate (bool) – True, if imported JSON should be validated against a JSON schema file. False, if JSON string should be imported directly and not checked if valid.
- filename_schema (str) – Filename of JSON schema with full path.
Returns: True if JSON imported correctly, False if not.
Return type:
-
get
()[source]¶ Create flat dict of all attributes.
Creates
dict
with all attributes in a flat structure. The flatdict
can then be used for further processing.Returns: Data in a flat data structure. Return type: dict
-
json
(data_format=None, validate=True, filename_schema=None)[source]¶ Create JSON from
DVObject
attributes.Parameters: Returns: The data as a JSON string.
Return type:
-
set
(data)[source]¶ Set class attributes by a flat dictionary.
The flat dict is the main way to set the class attributes. It is the main interface between the object and the outside world.
Parameters: data (dict) – Flat dictionary. All keys will be mapped to a similar named attribute and it’s value. Returns: True if all attributes are set, False if wrong data type was passed. Return type: bool
-
-
class
Datafile
(data=None)[source]¶ Base class for the Dataverse data type Datafile.
-
_default_json_format
¶ str – Default JSON data format.
-
_default_json_schema_filename
¶ str – Default JSON schema filename.
-
_allowed_json_formats
¶ list – List of all possible JSON data formats.
-
from_json
(json_str, data_format=None, validate=True, filename_schema=None)¶ Import metadata from a JSON file.
Parses in the metadata from different JSON formats.
Parameters: - json_str (str) – JSON string to be imported.
- data_format (str) – Data formats available for import. See _allowed_json_formats.
- validate (bool) – True, if imported JSON should be validated against a JSON schema file. False, if JSON string should be imported directly and not checked if valid.
- filename_schema (str) – Filename of JSON schema with full path.
Returns: True if JSON imported correctly, False if not.
Return type:
-
get
()¶ Create flat dict of all attributes.
Creates
dict
with all attributes in a flat structure. The flatdict
can then be used for further processing.Returns: Data in a flat data structure. Return type: dict
-
json
(data_format=None, validate=True, filename_schema=None)¶ Create JSON from
DVObject
attributes.Parameters: Returns: The data as a JSON string.
Return type:
-
set
(data)¶ Set class attributes by a flat dictionary.
The flat dict is the main way to set the class attributes. It is the main interface between the object and the outside world.
Parameters: data (dict) – Flat dictionary. All keys will be mapped to a similar named attribute and it’s value. Returns: True if all attributes are set, False if wrong data type was passed. Return type: bool
-
-
class
Dataset
(data=None)[source]¶ Base class for the Dataverse data type Dataset.
-
_default_json_format
¶ str – Default JSON data format.
-
_default_json_schema_filename
¶ str – Default JSON schema filename.
-
_allowed_json_formats
¶ list – List of all possible JSON data formats.
-
__attr_import_dv_up_datasetVersion_values
¶ list – Dataverse API Upload Dataset JSON attributes inside ds[‘datasetVersion’].
-
__attr_import_dv_up_citation_fields_values
¶ list – Dataverse API Upload Dataset JSON attributes inside ds[‘datasetVersion’][‘metadataBlocks’][‘citation’][‘fields’].
-
__attr_import_dv_up_citation_fields_arrays
¶ dict – Dataverse API Upload Dataset JSON attributes inside [‘datasetVersion’][‘metadataBlocks’][‘citation’][‘fields’].
-
__attr_import_dv_up_geospatial_fields_values
¶ list – Attributes of Dataverse API Upload Dataset JSON metadata standard inside [‘datasetVersion’][‘metadataBlocks’][‘geospatial’][‘fields’].
-
__attr_import_dv_up_geospatial_fields_arrays
¶ dict – Attributes of Dataverse API Upload Dataset JSON metadata standard inside [‘datasetVersion’][‘metadataBlocks’][‘geospatial’][‘fields’].
list – Attributes of Dataverse API Upload Dataset JSON metadata standard inside [‘datasetVersion’][‘metadataBlocks’][‘socialscience’][‘fields’].
-
__attr_import_dv_up_journal_fields_values
¶ list – Attributes of Dataverse API Upload Dataset JSON metadata standard inside [‘datasetVersion’][‘metadataBlocks’][‘journal’][‘fields’].
-
__attr_import_dv_up_journal_fields_arrays
¶ dict – Attributes of Dataverse API Upload Dataset JSON metadata standard inside [‘datasetVersion’][‘metadataBlocks’][‘journal’][‘fields’].
-
__attr_dict_dv_up_required
¶ list – Required attributes for valid dv_up metadata dict creation.
-
__attr_dict_dv_up_type_class_primitive
¶ list – typeClass primitive.
-
__attr_dict_dv_up_type_class_compound
¶ list – typeClass compound.
-
__attr_dict_dv_up_type_class_controlled_vocabulary
¶ list – typeClass controlledVocabulary.
-
__attr_dict_dv_up_single_dict
¶ list – This attributes are excluded from automatic parsing in ds.get() creation.
-
__attr_displayNames
¶ list – Attributes of displayName.
-
from_json
(json_str, data_format=None, validate=True, filename_schema=None)[source]¶ Import Dataset metadata from JSON file.
Parses in the metadata of a Dataset from different JSON formats.
Parameters: - json_str (str) – JSON string to be imported.
- data_format (str) – Data formats available for import. See _allowed_json_formats.
- validate (bool) – True, if imported JSON should be validated against a JSON schema file. False, if JSON string should be imported directly and not checked if valid.
- filename_schema (str) – Filename of JSON schema with full path.
Examples
Set Dataverse attributes via flat
dict
:>>> from pyDataverse.models import Dataset >>> ds = Dataset() >>> ds.from_json('tests/data/dataset_upload_min_default.json') >>> ds.title 'Darwin's Finches'
-
get
()¶ Create flat dict of all attributes.
Creates
dict
with all attributes in a flat structure. The flatdict
can then be used for further processing.Returns: Data in a flat data structure. Return type: dict
-
json
(data_format=None, validate=True, filename_schema=None)[source]¶ Create Dataset JSON from attributes.
Parameters: Returns: The data as a JSON string.
Return type:
-
set
(data)¶ Set class attributes by a flat dictionary.
The flat dict is the main way to set the class attributes. It is the main interface between the object and the outside world.
Parameters: data (dict) – Flat dictionary. All keys will be mapped to a similar named attribute and it’s value. Returns: True if all attributes are set, False if wrong data type was passed. Return type: bool
-
validate_json
(filename_schema=None)[source]¶ Validate JSON formats of Dataset.
Check if JSON data structure is valid.
Parameters: filename_schema (str) – Filename of JSON schema with full path. Returns: True if JSON validate correctly, False if not. Return type: bool Examples
Check if JSON is valid for Dataverse API upload:
>>> from pyDataverse.models import Dataset >>> ds = Dataset() >>> data = { >>> 'title': 'pyDataverse study 2019', >>> 'dsDescription': [ >>> {'dsDescriptionValue': 'New study about pyDataverse usage in 2019'} >>> ] >>> } >>> ds.set(data) >>> print(ds.validate_json()) False >>> ds.author = [{'authorName': 'LastAuthor1, FirstAuthor1'}] >>> ds.datasetContact = [{'datasetContactName': 'LastContact1, FirstContact1'}] >>> ds.subject = ['Engineering'] >>> print(ds.validate_json()) True
-
-
class
Dataverse
(data=None)[source]¶ Base class for the Dataverse data type Dataverse.
-
_default_json_format
¶ str – Default JSON data format.
-
_default_json_schema_filename
¶ str – Default JSON schema filename.
-
_allowed_json_formats
¶ list – List of all possible JSON data formats.
-
from_json
(json_str, data_format=None, validate=True, filename_schema=None)¶ Import metadata from a JSON file.
Parses in the metadata from different JSON formats.
Parameters: - json_str (str) – JSON string to be imported.
- data_format (str) – Data formats available for import. See _allowed_json_formats.
- validate (bool) – True, if imported JSON should be validated against a JSON schema file. False, if JSON string should be imported directly and not checked if valid.
- filename_schema (str) – Filename of JSON schema with full path.
Returns: True if JSON imported correctly, False if not.
Return type:
-
get
()¶ Create flat dict of all attributes.
Creates
dict
with all attributes in a flat structure. The flatdict
can then be used for further processing.Returns: Data in a flat data structure. Return type: dict
-
json
(data_format=None, validate=True, filename_schema=None)¶ Create JSON from
DVObject
attributes.Parameters: Returns: The data as a JSON string.
Return type:
-
set
(data)¶ Set class attributes by a flat dictionary.
The flat dict is the main way to set the class attributes. It is the main interface between the object and the outside world.
Parameters: data (dict) – Flat dictionary. All keys will be mapped to a similar named attribute and it’s value. Returns: True if all attributes are set, False if wrong data type was passed. Return type: bool
-
Utils Interface¶
Helper functions.
Helper functions.
-
clean_string
(string)[source]¶ Clean a string.
Trims whitespace.
Parameters: str (str) – String to be cleaned. Returns: Cleaned string. Return type: string
-
create_datafile_url
(base_url, identifier, is_filepid)[source]¶ Creates URL of Datafile.
Example - File ID: https://data.aussda.at/file.xhtml?persistentId=doi:10.11587/CCESLK/5RH5GK
Parameters: Returns: URL of the datafile
Return type:
-
create_dataset_url
(base_url, identifier, is_pid)[source]¶ Creates URL of Dataset.
Example: https://data.aussda.at/dataset.xhtml?persistentId=doi:10.11587/CCESLK
Parameters: Returns: URL of the dataset
Return type:
-
create_dataverse_url
(base_url, identifier)[source]¶ Creates URL of Dataverse.
Example: https://data.aussda.at/dataverse/autnes
Parameters: Returns: URL of the dataverse
Return type:
-
dataverse_tree_walker
(data: list, dv_keys: list = ['dataverse_id', 'dataverse_alias'], ds_keys: list = ['dataset_id', 'pid'], df_keys: list = ['datafile_id', 'filename', 'pid', 'label']) → tuple[source]¶ Walk through a Dataverse tree by get_children().
Recursively walk through the tree structure returned by
get_children()
and extract the keys needed.Parameters: Returns: (List of Dataverse, List of Datasets, List of Datafiles)
Return type:
-
read_csv
(filename, newline='', delimiter=', ', quotechar='"', encoding='utf-8')[source]¶ Read in a CSV file.
See more at csv.
Parameters: Returns: Reader object, which can be iterated over.
Return type: reader
-
read_csv_as_dicts
(filename, newline='', delimiter=',', quotechar='"', encoding='utf-8', remove_prefix=True, prefix='dv.', json_cols=['otherId', 'series', 'author', 'dsDescription', 'subject', 'keyword', 'topicClassification', 'language', 'grantNumber', 'dateOfCollection', 'kindOfData', 'dataSources', 'otherReferences', 'contributor', 'relatedDatasets', 'relatedMaterial', 'datasetContact', 'distributor', 'producer', 'publication', 'software', 'timePeriodCovered', 'geographicUnit', 'geographicBoundingBox', 'geographicCoverage', 'socialScienceNotes', 'unitOfAnalysis', 'universe', 'targetSampleActualSize', 'categories'], false_values=['FALSE'], true_values=['TRUE'])[source]¶ Read in CSV file into a list of
dict
.This offers an easy import functionality of your data from CSV files. See more at csv.
CSV file structure: 1) The header row contains the column names. 2) A row contains one dataset 3) A column contains one specific attribute.
Recommendation: Name the column name the way you want the attribute to be named later in your Dataverse object. See the pyDataverse templates for this. The created
dict
can later be used for the set() function to create Dataverse objects.Parameters: Returns: List with one
dict
each row. The keys of adict
are named after the columen names.Return type:
-
read_file
(filename, mode='r', encoding='utf-8')[source]¶ Read in a file.
Parameters: - filename (str) – Filename with full path.
- mode (str) – Read mode of file. Defaults to r. See more at https://docs.python.org/3.5/library/functions.html#open
Returns: Returns data as string.
Return type:
-
read_json
(filename: str, mode: str = 'r', encoding: str = 'utf-8') → dict[source]¶ Read in a json file.
See more about the json module at https://docs.python.org/3.5/library/json.html
Parameters: - filename (str) – Filename with full path.
- mode (str) – Read mode of file. Defaults to w. See more at https://docs.python.org/3.5/library/functions.html#open
- encoding (str) – Character encoding of file. Defaults to ‘utf-8’.
Returns: Data as a json-formatted string.
Return type:
-
read_pickle
(filename)[source]¶ Read in pickle file.
See more at pickle.
Parameters: filename (str) – Full filename with path of file. Returns: Data object. Return type: dict
-
save_tree_data
(dataverses: list, datasets: list, datafiles: list, filename_dv: str = 'dataverses.json', filename_ds: str = 'datasets.json', filename_df: str = 'datafiles.json', filename_md: str = 'metadata.json') → None[source]¶ Save lists from data returend by
dv_tree_walker
.Collect lists of Dataverses, Datasets and Datafiles and save them in seperated JSON files.
Parameters: - data (dict) – Tree data structure returned by
get_children()
. - filename_dv (str) – Filename with full path for the Dataverse JSON file.
- filename_ds (str) – Filename with full path for the Dataset JSON file.
- filename_df (str) – Filename with full path for the Datafile JSON file.
- filename_md (str) – Filename with full path for the metadata JSON file.
- data (dict) – Tree data structure returned by
-
validate_data
(data: dict, filename_schema: str, file_format: str = 'json') → bool[source]¶ Validate data against a schema.
Parameters: Returns: True if data was validated, False if not.
Return type:
-
write_csv
(data, filename, newline='', delimiter=', ', quotechar='"', encoding='utf-8')[source]¶ Short summary.
See more at csv.
Parameters: - data (list) – List of
dict
. Key is column, value is cell content. - filename (str) – Full filename with path of file.
- newline (str) – Newline character.
- delimiter (str) – Cell delimiter of CSV file. Defaults to ‘;’.
- quotechar (str) – Quote-character of CSV file. Defaults to ‘”’.
- encoding (str) – Character encoding of file. Defaults to ‘utf-8’.
- data (list) – List of
-
write_dicts_as_csv
(data, fieldnames, filename, delimiter=', ', quotechar='"')[source]¶ Write
dict
to a CSV fileThis offers an easy export functionality of your data to a CSV files. See more at csv.
Parameters: - data (dict) – Dictionary with columns as keys, to be written in the CSV file.
- fieldnames (list) – Sequence of keys that identify the order of the columns.
- filename (str) – Filename with full path.
- delimiter (str) – Cell delimiter of CSV file. Defaults to ‘;’.
- quotechar (str) – Quote-character of CSV file. Defaults to ‘”’.
-
write_file
(filename, data, mode='w', encoding='utf-8')[source]¶ Write data in a file.
Parameters: - filename (str) – Filename with full path.
- data (str) – Data to be stored.
- mode (str) – Read mode of file. Defaults to w. See more at https://docs.python.org/3.5/library/functions.html#open
- encoding (str) – Character encoding of file. Defaults to ‘utf-8’.
-
write_json
(filename, data, mode='w', encoding='utf-8')[source]¶ Write data to a json file.
Parameters: - filename (str) – Filename with full path.
- data (dict) – Data to be written in the JSON file.
- mode (str) – Write mode of file. Defaults to w. See more at https://docs.python.org/3/library/functions.html#open
- encoding (str) – Character encoding of file. Defaults to ‘utf-8’.