Commit b742dfcc authored by Swaroop Vattam's avatar Swaroop Vattam
Browse files

updated README

parent 5fadb311
Pipeline #41 passed with stage
in 151 minutes and 42 seconds
# Public D3M datasets
This repository contains public D3M datasets.
The first step in building a thriving AutoML research community is making sure that there are enough high quality datasets available to the community. This corpus contains a large number of datasets collected and developed under the umbrella of DARPA's D3M program. Each dataset in this corpus was painstakingly curated and annotated with extensive metadata to ensure that the AutoML community is presented with challenging datasets that go beyond the simple tabular datasets and cover a rich set of problem types and data types. Some of the problem and data types covered by this corpus are classification (binary, multi-class, and multi-label) and regression (univariate and multivariate) over tabular, text, image, video and audio data; time series forecasting; object detection; graph problems such as link prediction, vertex nomination, community detection, collaborative filtering; multi-table relational data; multiple-instance learning problem, etc. This corpus hopes to unite researchers in discovering the new frontiers of AutoML research.
## Organization
This corpus is organized into seed datasets and training datasets.
└── seed_datasets
└── training_datasets
├── LL0
└── LL1
`seed_datasets` contain sample datasets that provide a flavor of all the major data types and problem types. `training_datasets` contain a lot more datasets and are used for developing deeper AutoML capabilities. Within `training_datasets`, `LL0` contain simpler level 0 datasets (tabular datasets) and `LL1` contains harder level 1 datasets (raw data, graph data, relational data, etc).
## Downloading
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment