Integrated Data

Integrated Data#

As the result of the data integration, four formats of data have been generated for downstream data analysis purposes.

For more details of static files exported for ACDE, please refer to this jupyter notebook ACDE_Export.ipynb.

Format	Description
Database Dump	A MongoDB database `acd_engine` that consists of 7 collections, including `event`, `organisation`, `person`, `place`, `recognition`, `relationship`, `resource`, `work`. This type of data is intended primarily for development purposes.
CSV	Each `csv` file corresponds to a specific entity in ACDEA and contains data organised according to the attributes of that entity. The columns of the `csv` file represent the first-level attributes of the corresponding entity. Throughout the Data Analysis section of the book, we provide guidance on how on to download `csv` files directly from GoogleDrive using python. Alternatively, you can manually download these unified datasets from the ACDE Google Drive repository.
JSONL	`jsonl` is a format that stores data records as individual json objects separated by a newline character. Each `jsonl` file corresponds to a specific entity in ACDEA, and the data is structured as a list of dictionaries, where each dictionary represents a data record. Analogous to the `csv` files, you will also be able to effectively download the unified data as `jsonl` files via the Data Analysis section of the book, or manually download `jsonl` files from the ACDE Google Drive repository.
XLSX	Each XLSX file corresponds to a 10% sample of the specific entity in ACDEA. Similar to the CSV files, the columns of each XLSX file represent the first-level attributes of the corresponding entity. This type of data is intended for exploratory purposes. You can download the sample files using the download buttons below.