The Configurable Data Curation System (CDCS) is a family of systems for structuring, searching, and sharing data. The two primary system types in this family are the Materials Data Curation System (MDCS) and the Materials Resource Registry (MRR) systems. The curator is for structuring and entering data into accessible online data repositories. The registry is for locating data in distributed data repositories. They were originally created in the materials science community under the Materials Genome Initiative (MGI).
MDCS 2.0 represents the first release of the 2.0 series curator in which all of the functionality from the previous major curator release (1.5) has been ported into a modular core of configurable applications. Throughout the remainder of the MDCS 2.0 series, functionality from the registry as well as functionality from early adopters and customizations of both curator and registry systems are being ported into the MDCS 2.0 core.
This document captures the essential functionality of the MDCS 2.0 curator system. It will be expanded and updated over time as more functionality becomes available.
The essential functionality of the curator includes the following:
Function | Description |
---|---|
General tasks | General user tasks such as requesting an account, logging in, etc. |
Structuring data | Creating data entry templates with the composer application. |
Entering data | Inputting data using data entry templates with the curator application. |
Accessing data | Controlling data access via creation and use of users, groups, and workspaces. |
Finding data | Searching for data locally and remotely. |
Transforming data | Transforming data from one format or representation to another. |
Managing resources | Managing resources and configurations on the curator platform. |
Operations | Operational tasks such as deployment, creating custom themes, etc. |
The system's functionality is implemented using a python-based web-framework (Django) and a NoSQL database (MongoDB) as its primary data store. It uses XML technologies as its primary data representation for system artifacts such as data templates and types (XML schemas), data documents and records (XML documents), data transformations (using XSLT), data entry form generation (XML-based generation), and data query representation (XML-based).