Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
d3m
datasets
Commits
47b82e2a
Commit
47b82e2a
authored
Dec 12, 2019
by
Swaroop Vattam
Browse files
Merge branch 'version4' into 'master'
Version4 See merge request
d3m/datasets!1
parents
2b0c227f
b71865e6
Pipeline
#9
passed with stage
in 22 minutes
Changes
302
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
310 additions
and
2131 deletions
+310
-2131
.gitlab-ci.yml
.gitlab-ci.yml
+2
-2
README.md
README.md
+10
-6
seed_datasets_current/124_188_usps/124_188_usps_dataset/datasetDoc.json
...current/124_188_usps/124_188_usps_dataset/datasetDoc.json
+13
-9
seed_datasets_current/124_188_usps/124_188_usps_problem/problemDoc.json
...current/124_188_usps/124_188_usps_problem/problemDoc.json
+29
-6
seed_datasets_current/124_188_usps/SCORE/baseline_scores.csv
seed_datasets_current/124_188_usps/SCORE/baseline_scores.csv
+0
-2
seed_datasets_current/124_188_usps/SCORE/dataset_SCORE/datasetDoc.json
..._current/124_188_usps/SCORE/dataset_SCORE/datasetDoc.json
+13
-9
seed_datasets_current/124_188_usps/SCORE/problem_SCORE/problemDoc.json
..._current/124_188_usps/SCORE/problem_SCORE/problemDoc.json
+30
-7
seed_datasets_current/124_188_usps/TEST/dataset_TEST/datasetDoc.json
...ts_current/124_188_usps/TEST/dataset_TEST/datasetDoc.json
+13
-9
seed_datasets_current/124_188_usps/TEST/problem_TEST/problemDoc.json
...ts_current/124_188_usps/TEST/problem_TEST/problemDoc.json
+30
-7
seed_datasets_current/124_188_usps/TRAIN/dataset_TRAIN/datasetDoc.json
..._current/124_188_usps/TRAIN/dataset_TRAIN/datasetDoc.json
+13
-9
seed_datasets_current/124_188_usps/TRAIN/problem_TRAIN/problemDoc.json
..._current/124_188_usps/TRAIN/problem_TRAIN/problemDoc.json
+30
-7
seed_datasets_current/124_188_usps/mitll_predictions.csv
seed_datasets_current/124_188_usps/mitll_predictions.csv
+0
-2008
seed_datasets_current/1491_one_hundred_plants_margin/1491_one_hundred_plants_margin_dataset/datasetDoc.json
...in/1491_one_hundred_plants_margin_dataset/datasetDoc.json
+8
-6
seed_datasets_current/1491_one_hundred_plants_margin/1491_one_hundred_plants_margin_problem/problemDoc.json
...in/1491_one_hundred_plants_margin_problem/problemDoc.json
+29
-6
seed_datasets_current/1491_one_hundred_plants_margin/SCORE/baseline_scores.csv
.../1491_one_hundred_plants_margin/SCORE/baseline_scores.csv
+0
-2
seed_datasets_current/1491_one_hundred_plants_margin/SCORE/dataset_TEST/datasetDoc.json
..._hundred_plants_margin/SCORE/dataset_TEST/datasetDoc.json
+8
-6
seed_datasets_current/1491_one_hundred_plants_margin/SCORE/problem_TEST/problemDoc.json
..._hundred_plants_margin/SCORE/problem_TEST/problemDoc.json
+33
-9
seed_datasets_current/1491_one_hundred_plants_margin/TEST/dataset_TEST/datasetDoc.json
...e_hundred_plants_margin/TEST/dataset_TEST/datasetDoc.json
+8
-6
seed_datasets_current/1491_one_hundred_plants_margin/TEST/problem_TEST/problemDoc.json
...e_hundred_plants_margin/TEST/problem_TEST/problemDoc.json
+33
-9
seed_datasets_current/1491_one_hundred_plants_margin/TRAIN/dataset_TRAIN/datasetDoc.json
...hundred_plants_margin/TRAIN/dataset_TRAIN/datasetDoc.json
+8
-6
No files found.
.gitlab-ci.yml
View file @
47b82e2a
...
...
@@ -11,7 +11,7 @@ test:
-
git lfs fetch --all
-
pip3 install cerberus==1.3.1 deep_dircmp==0.1.0
-
git clone --recursive https://gitlab.com/datadrivendiscovery/data-supply.git
-
git -C data-supply checkout
51efe8f74ae2ec223a1540782945beee1f05bf00
-
git -C data-supply checkout
4d67a8acee3fe5236900137a528bc48cf05731a3
script
:
-
|
...
...
@@ -25,7 +25,7 @@ test:
if [ "${CI_COMMIT_REF_NAME}" = master ]; then
if [ -n "${GIT_ACCESS_USER}" -a -n "${GIT_ACCESS_TOKEN}" ]; then
echo "Pushing updated digests."
git remote set-url --push origin "https://${GIT_ACCESS_USER}:${GIT_ACCESS_TOKEN}@
datasets
.datadrivendiscovery.org/${CI_PROJECT_PATH}.git"
git remote set-url --push origin "https://${GIT_ACCESS_USER}:${GIT_ACCESS_TOKEN}@
gitlab
.datadrivendiscovery.org/${CI_PROJECT_PATH}.git"
git config --local user.email noreply@datadrivendiscovery.org
git config --local user.name "D3M CI"
if ! git diff --quiet ; then
...
...
README.md
View file @
47b82e2a
# P
ublic
D3M datasets
# P
rivate
D3M datasets
This repository contains public D3M datasets.
This repository contains private D3M datasets.
**Do not distribute them.**
**Public D3M datasets are available [here](https://datasets.datadrivendiscovery.org/d3m/datasets).**
Please report any issues with private datasets in
[
data-supply repository
](
https://gitlab.com/datadrivendiscovery/data-supply/issues
)
.
Datasets schemas and related documentation is available in
[
data-supply repository
](
https://gitlab.com/datadrivendiscovery/data-supply
)
.
...
...
@@ -9,15 +13,15 @@ Datasets schemas and related documentation is available in [data-supply reposito
Download datasets using
[
git LFS
](
https://git-lfs.github.com/
)
:
```
$ git lfs clone git@
datasets
.datadrivendiscovery.org:d3m/datasets.git
$ git lfs clone git@
gitlab
.datadrivendiscovery.org:d3m/datasets.git
```
Note, use
`git lfs clone`
instead of
`git clone`
because it
is faster.
This will take time but especially disk space. Currently all
datasets are around 4
6
GB, but the whole directory with cloned
repository and git metadata is around
65
GB. Running
datasets are around
5
4 GB, but the whole directory with cloned
repository and git metadata is around
84
GB. Running
`git lfs prune`
might help by removing old and unreferenced files.
Repository is organized so that all files larger than 100 KB are
...
...
@@ -31,7 +35,7 @@ It is possible to download only part of the repository. First clone
without downloading files managed by git LFS:
```
$ git lfs clone git@
datasets
.datadrivendiscovery.org:d3m/datasets.git -X "*"
$ git lfs clone git@
gitlab
.datadrivendiscovery.org:d3m/datasets.git -X "*"
```
This will download and checkout all files smaller than 100 KB.
...
...
seed_datasets_current/124_188_usps/124_188_usps_dataset/datasetDoc.json
View file @
47b82e2a
...
...
@@ -7,28 +7,32 @@
"license"
:
"open"
,
"source"
:
"USPS"
,
"sourceURI"
:
"http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html"
,
"datasetSchemaVersion"
:
"
3.2
.0"
,
"datasetSchemaVersion"
:
"
4.0
.0"
,
"redacted"
:
false
,
"datasetVersion"
:
"
2
.0"
,
"digest"
:
"
c75da4fea7a7d4e4c67b9ff8175ab646e2fab606e2006a2d929e3cc8de6d7012
"
"datasetVersion"
:
"
4.0
.0"
,
"digest"
:
"
dc64b78bce3f4a88dfdb0ebd834bbce42b81bb137c8bd4cd152db6777be5d62d
"
},
"dataResources"
:
[
{
"resID"
:
"0"
,
"resPath"
:
"media/"
,
"resType"
:
"image"
,
"resFormat"
:
[
"image/png"
],
"resFormat"
:
{
"image/png"
:
[
"png"
]
},
"isCollection"
:
true
},
{
"resID"
:
"learningData"
,
"resPath"
:
"tables/learningData.csv"
,
"resType"
:
"table"
,
"resFormat"
:
[
"text/csv"
],
"resFormat"
:
{
"text/csv"
:
[
"csv"
]
},
"isCollection"
:
false
,
"columns"
:
[
{
...
...
seed_datasets_current/124_188_usps/124_188_usps_problem/problemDoc.json
View file @
47b82e2a
...
...
@@ -3,10 +3,13 @@
"problemID"
:
"124_188_usps_problem"
,
"problemName"
:
"usps_problem"
,
"problemDescription"
:
"Multiclass image classification problem. Each image belongs to one of 10 classes."
,
"taskType"
:
"classification"
,
"taskSubType"
:
"multiClass"
,
"problemSchemaVersion"
:
"3.2.0"
,
"problemVersion"
:
"2.0"
"problemSchemaVersion"
:
"4.0.0"
,
"problemVersion"
:
"4.0.0"
,
"taskKeywords"
:
[
"classification"
,
"multiClass"
,
"image"
]
},
"inputs"
:
{
"data"
:
[
...
...
@@ -27,7 +30,27 @@
"testSize"
:
0.216
,
"stratified"
:
false
,
"numRepeats"
:
0
,
"splitsFile"
:
"dataSplits.csv"
"splitsFile"
:
"dataSplits.csv"
,
"datasetViewMaps"
:
{
"train"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_TRAIN"
}
],
"test"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_TEST"
}
],
"score"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_SCORE"
}
]
}
},
"performanceMetrics"
:
[
{
...
...
@@ -38,4 +61,4 @@
"expectedOutputs"
:
{
"predictionsFile"
:
"predictions.csv"
}
}
}
\ No newline at end of file
seed_datasets_current/124_188_usps/SCORE/baseline_scores.csv
deleted
100644 → 0
View file @
2b0c227f
,metric,value
0,accuracy,0.8345789735924265
seed_datasets_current/124_188_usps/SCORE/dataset_SCORE/datasetDoc.json
View file @
47b82e2a
...
...
@@ -7,28 +7,32 @@
"license"
:
"open"
,
"source"
:
"USPS"
,
"sourceURI"
:
"http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html"
,
"datasetSchemaVersion"
:
"
3.2
.0"
,
"datasetSchemaVersion"
:
"
4.0
.0"
,
"redacted"
:
false
,
"datasetVersion"
:
"
2
.0"
,
"digest"
:
"
a8f27ab2e3cb1443d8cfbe2b3a286774ef8012b0c6ba4e41e029f4eda06f20f7
"
"datasetVersion"
:
"
4.0
.0"
,
"digest"
:
"
73d76f80d119e04a0aef61cd014d312df5d786fae76a592ef8c0932f0a509914
"
},
"dataResources"
:
[
{
"resID"
:
"0"
,
"resPath"
:
"media/"
,
"resType"
:
"image"
,
"resFormat"
:
[
"image/png"
],
"resFormat"
:
{
"image/png"
:
[
"png"
]
},
"isCollection"
:
true
},
{
"resID"
:
"learningData"
,
"resPath"
:
"tables/learningData.csv"
,
"resType"
:
"table"
,
"resFormat"
:
[
"text/csv"
],
"resFormat"
:
{
"text/csv"
:
[
"csv"
]
},
"isCollection"
:
false
,
"columns"
:
[
{
...
...
seed_datasets_current/124_188_usps/SCORE/problem_SCORE/problemDoc.json
View file @
47b82e2a
{
"about"
:
{
"problemID"
:
"124_188_usps_problem
_SCORE
"
,
"problemID"
:
"124_188_usps_problem"
,
"problemName"
:
"usps_problem"
,
"problemDescription"
:
"Multiclass image classification problem. Each image belongs to one of 10 classes."
,
"taskType"
:
"classification"
,
"taskSubType"
:
"multiClass"
,
"problemSchemaVersion"
:
"3.2.0"
,
"problemVersion"
:
"2.0"
"problemSchemaVersion"
:
"4.0.0"
,
"problemVersion"
:
"4.0.0"
,
"taskKeywords"
:
[
"classification"
,
"multiClass"
,
"image"
]
},
"inputs"
:
{
"data"
:
[
{
"datasetID"
:
"124_188_usps_dataset
_SCORE
"
,
"datasetID"
:
"124_188_usps_dataset"
,
"targets"
:
[
{
"targetIndex"
:
0
,
...
...
@@ -27,7 +30,27 @@
"testSize"
:
0.216
,
"stratified"
:
false
,
"numRepeats"
:
0
,
"splitsFile"
:
"dataSplits.csv"
"splitsFile"
:
"dataSplits.csv"
,
"datasetViewMaps"
:
{
"train"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_TRAIN"
}
],
"test"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_TEST"
}
],
"score"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_SCORE"
}
]
}
},
"performanceMetrics"
:
[
{
...
...
seed_datasets_current/124_188_usps/TEST/dataset_TEST/datasetDoc.json
View file @
47b82e2a
...
...
@@ -7,28 +7,32 @@
"license"
:
"open"
,
"source"
:
"USPS"
,
"sourceURI"
:
"http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html"
,
"datasetSchemaVersion"
:
"
3.2
.0"
,
"datasetSchemaVersion"
:
"
4.0
.0"
,
"redacted"
:
false
,
"datasetVersion"
:
"
2
.0"
,
"digest"
:
"
6861672270a24f3f199f7175b4904e6a588e1ed7433ed7cb1bdbbdd0225f8274
"
"datasetVersion"
:
"
4.0
.0"
,
"digest"
:
"
2c26e78b65ae0ae3ee6dc2ad6f28eaec138c6c37a23552bd33b3db2461a0652b
"
},
"dataResources"
:
[
{
"resID"
:
"0"
,
"resPath"
:
"media/"
,
"resType"
:
"image"
,
"resFormat"
:
[
"image/png"
],
"resFormat"
:
{
"image/png"
:
[
"png"
]
},
"isCollection"
:
true
},
{
"resID"
:
"learningData"
,
"resPath"
:
"tables/learningData.csv"
,
"resType"
:
"table"
,
"resFormat"
:
[
"text/csv"
],
"resFormat"
:
{
"text/csv"
:
[
"csv"
]
},
"isCollection"
:
false
,
"columns"
:
[
{
...
...
seed_datasets_current/124_188_usps/TEST/problem_TEST/problemDoc.json
View file @
47b82e2a
{
"about"
:
{
"problemID"
:
"124_188_usps_problem
_TEST
"
,
"problemID"
:
"124_188_usps_problem"
,
"problemName"
:
"usps_problem"
,
"problemDescription"
:
"Multiclass image classification problem. Each image belongs to one of 10 classes."
,
"taskType"
:
"classification"
,
"taskSubType"
:
"multiClass"
,
"problemSchemaVersion"
:
"3.2.0"
,
"problemVersion"
:
"2.0"
"problemSchemaVersion"
:
"4.0.0"
,
"problemVersion"
:
"4.0.0"
,
"taskKeywords"
:
[
"classification"
,
"multiClass"
,
"image"
]
},
"inputs"
:
{
"data"
:
[
{
"datasetID"
:
"124_188_usps_dataset
_TEST
"
,
"datasetID"
:
"124_188_usps_dataset"
,
"targets"
:
[
{
"targetIndex"
:
0
,
...
...
@@ -27,7 +30,27 @@
"testSize"
:
0.216
,
"stratified"
:
false
,
"numRepeats"
:
0
,
"splitsFile"
:
"dataSplits.csv"
"splitsFile"
:
"dataSplits.csv"
,
"datasetViewMaps"
:
{
"train"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_TRAIN"
}
],
"test"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_TEST"
}
],
"score"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_SCORE"
}
]
}
},
"performanceMetrics"
:
[
{
...
...
seed_datasets_current/124_188_usps/TRAIN/dataset_TRAIN/datasetDoc.json
View file @
47b82e2a
...
...
@@ -7,28 +7,32 @@
"license"
:
"open"
,
"source"
:
"USPS"
,
"sourceURI"
:
"http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html"
,
"datasetSchemaVersion"
:
"
3.2
.0"
,
"datasetSchemaVersion"
:
"
4.0
.0"
,
"redacted"
:
false
,
"datasetVersion"
:
"
2
.0"
,
"digest"
:
"
4fdd5af8fca34e57aeaf7fe085c174c1050efeb45f0646670a4134e341fea6c1
"
"datasetVersion"
:
"
4.0
.0"
,
"digest"
:
"
33b71f121dcd1466895e4f83ff20272ae57397582277359a862a7cbade2c4cf4
"
},
"dataResources"
:
[
{
"resID"
:
"0"
,
"resPath"
:
"media/"
,
"resType"
:
"image"
,
"resFormat"
:
[
"image/png"
],
"resFormat"
:
{
"image/png"
:
[
"png"
]
},
"isCollection"
:
true
},
{
"resID"
:
"learningData"
,
"resPath"
:
"tables/learningData.csv"
,
"resType"
:
"table"
,
"resFormat"
:
[
"text/csv"
],
"resFormat"
:
{
"text/csv"
:
[
"csv"
]
},
"isCollection"
:
false
,
"columns"
:
[
{
...
...
seed_datasets_current/124_188_usps/TRAIN/problem_TRAIN/problemDoc.json
View file @
47b82e2a
{
"about"
:
{
"problemID"
:
"124_188_usps_problem
_TRAIN
"
,
"problemID"
:
"124_188_usps_problem"
,
"problemName"
:
"usps_problem"
,
"problemDescription"
:
"Multiclass image classification problem. Each image belongs to one of 10 classes."
,
"taskType"
:
"classification"
,
"taskSubType"
:
"multiClass"
,
"problemSchemaVersion"
:
"3.2.0"
,
"problemVersion"
:
"2.0"
"problemSchemaVersion"
:
"4.0.0"
,
"problemVersion"
:
"4.0.0"
,
"taskKeywords"
:
[
"classification"
,
"multiClass"
,
"image"
]
},
"inputs"
:
{
"data"
:
[
{
"datasetID"
:
"124_188_usps_dataset
_TRAIN
"
,
"datasetID"
:
"124_188_usps_dataset"
,
"targets"
:
[
{
"targetIndex"
:
0
,
...
...
@@ -27,7 +30,27 @@
"testSize"
:
0.216
,
"stratified"
:
false
,
"numRepeats"
:
0
,
"splitsFile"
:
"dataSplits.csv"
"splitsFile"
:
"dataSplits.csv"
,
"datasetViewMaps"
:
{
"train"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_TRAIN"
}
],
"test"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_TEST"
}
],
"score"
:
[
{
"from"
:
"124_188_usps_dataset"
,
"to"
:
"124_188_usps_dataset_SCORE"
}
]
}
},
"performanceMetrics"
:
[
{
...
...
seed_datasets_current/124_188_usps/mitll_predictions.csv
deleted
100644 → 0
View file @
2b0c227f
d3mIndex,label
7291,5
7292,7
7293,4
7294,3
7295,7
7296,1
7297,1
7298,1
7299,7
7300,5
7301,7
7302,3
7303,3
7304,5
7305,1
7306,10
7307,2
7308,7
7309,10
7310,7
7311,3
7312,3
7313,10
7314,5
7315,7
7316,3
7317,1
7318,4
7319,6
7320,4
7321,8
7322,1
7323,6
7324,1
7325,8
7326,10
7327,9
7328,1
7329,1
7330,8
7331,1
7332,5
7333,2
7334,1
7335,8
7336,2
7337,1
7338,10
7339,3
7340,1
7341,1
7342,6
7343,2
7344,1
7345,1
7346,7
7347,6
7348,10
7349,3
7350,1
7351,10
7352,1
7353,5
7354,3
7355,1
7356,10
7357,2
7358,1
7359,3
7360,2
7361,10
7362,4
7363,8
7364,3
7365,1
7366,5
7367,1
7368,2
7369,3
7370,2
7371,3
7372,1
7373,8
7374,1
7375,1
7376,7
7377,8
7378,1
7379,3
7380,1
7381,10
7382,1
7383,9
7384,9
7385,3
7386,1
7387,8
7388,1
7389,1
7390,10
7391,1
7392,7
7393,8
7394,4
7395,1
7396,8
7397,3
7398,10
7399,4
7400,5
7401,4
7402,4
7403,10
7404,10
7405,8
7406,1
7407,6
7408,5
7409,5
7410,7
7411,9
7412,1
7413,3
7414,1
7415,7
7416,5
7417,6
7418,7
7419,2
7420,10
7421,6
7422,5
7423,1
7424,2
7425,7
7426,10
7427,1
7428,1
7429,8
7430,8
7431,5