As the result of the data integration, four formats of data have been generated for downstream data analysis purposes.
For more details of static files exported for ACDE, please refer to this jupyter notebook ACDE_Export.ipynb.
Format
Description
Database Dump
A MongoDB database acd_engine that consists of 7 collections, including event, organisation, person, place, recognition, relationship, resource, work. This type of data is intended primarily for development purposes.
CSV
Each csv file corresponds to a specific entity in ACDEA and contains data organised according to the attributes of that entity. The columns of the csv file represent the first-level attributes of the corresponding entity.
Throughout the Data Analysis section of the book, we provide guidance on how on to download csv files directly from GoogleDrive using python. Alternatively, you can manually download these unified datasets from the ACDE Google Drive repository.
JSONL
jsonl is a format that stores data records as individual json objects separated by a newline character. Each jsonl file corresponds to a specific entity in ACDEA, and the data is structured as a list of dictionaries, where each dictionary represents a data record.
Analogous to the csv files, you will also be able to effectively download the unified data as jsonl files via the Data Analysis section of the book, or manually download jsonl files from the ACDE Google Drive repository.
XLSX
Each XLSX file corresponds to a 10% sample of the specific entity in ACDEA. Similar to the CSV files, the columns of each XLSX file represent the first-level attributes of the corresponding entity. This type of data is intended for exploratory purposes.
You can download the sample files using the download buttons below.