Citation: @article{krizhevsky2009learning,title={Learning multiple layers of features from tiny images},author={Krizhevsky, Alex and Hinton, Geoffrey},year={2009},publisher={Technical report, University of Toronto}}
-----------------END------------------
\ No newline at end of file
ID: 124_174_cifar10_MIN_METADATA_dataset
Name: cifar10
Description: Image recognition dataset consisting of 60000 32x32 colour images in 10 classes, with 6000 images per class.
Citation: @article{Cai11SRKDA, author = {Deng Cai and Xiaofei He and Jiawei Han}, title = {Speed Up Kernel Discriminant Analysis}, journal = {The VLDB Journal}, volume = {20}, number = {1},year = {2011}, pages = {21-33}} || @ARTICLE{Cai11SRKDA, AUTHOR = {Deng Cai and Xiaofei He and Jiawei Han}, TITLE = {Speed Up Kernel Discriminant Analysis}, JOURNAL = {The VLDB Journal}, YEAR = {2011}, volume = {20}, number = {1}, pages = {21-33}, }
-----------------END------------------
\ No newline at end of file
ID: 124_188_usps_MIN_METADATA_dataset
Name: usps
Description: Image recognition dataset consisting of 9298 16x16 images of 10 handwritten digits.
Citation: @article{nene1996columbia,title={Columbia object image library (coil-20)},author={Nene, Sameer A and Nayar, Shree K and Murase, Hiroshi and others},year={1996},publisher={Technical report CUCS-005-96}}
-----------------END------------------
\ No newline at end of file
ID: 124_214_coil20_MIN_METADATA_dataset
Name: coil-20
Description: Image recognition dataset of 20 objects from 72 different views.
This dataset was collected for use within the DARPA Data Driven Discovery of Models (D3M) program.
ID: 1491_one_hundred_plants_margin
Name: 1491_one_hundred_plants_margin_dataset
Description: Plant Leaf Classification Using Probabilistic Integration of Shape, Texture and Margin Features. Signal Processing, Pattern Recognition and Applications, in press. 2013.
Description: Plant Leaf Classification Using Probabilistic Integration of Shape, Texture and Margin Features. Signal Processing, Pattern Recognition and Applications, in press. 2013.
### Description
...
...
@@ -23,11 +20,21 @@ There is a total of 1600 samples with 16 samples per leaf class (100 classes), a
This dataset was collected for use within the DARPA Data Driven Discovery of Models (D3M) program.
ID: 1567_poker_hand
Name: 1567_poker_hand_dataset
Description: Purpose is to predict poker hands
ID: 1567_poker_hand_MIN_METADATA_dataset
Name: poker_hand
Description: Purpose is to predict poker hands
Each record is an example of a hand consisting of five playing cards drawn from a standard deck of 52. Each card is described using two attributes (suit and rank), for a total of 10 predictive attributes. There is one Class attribute that describes the "Poker Hand". The order of cards is important, which is why there are 480 possible Royal Flush hands as compared to 4 (one for each suit).
...
...
@@ -59,11 +56,21 @@ Ordinal (0-9)
R. Cattral, F. Oppacher, D. Deugo. Evolutionary Data Mining with Automatic Rule Generalization. Recent Advances in Computers, Computing and Communications, pp.296-300, WSEAS Press, 2002.
Note: This was a slightly different dataset that had more classes, and was considerably more difficult
Citation: @misc{Dua:2019 ,author = {Dua, Dheeru and Graff, Casey},year = {2017},title = {{UCI} Machine Learning Repository},url = {http://archive.ics.uci.edu/ml},institution = {University of California, Irvine, School of Information and Computer Sciences} }
**Please cite**: Jeffrey S. Simonoff, Analyzing Categorical Data, Springer-Verlag, New York, 2003
Database of baseball players and play statistics, including 'Games_played', 'At_bats', 'Runs', 'Hits', 'Doubles', 'Triples', 'Home_runs', 'RBIs', 'Walks', 'Strikeouts', 'Batting_average', 'On_base_pct', 'Slugging_pct' and 'Fielding_ave'
Notes:
* Quotes, Single-Quotes and Backslashes were removed, Blanks replaced with Underscores
* Player is an identifier that should be ignored when modelling the data
Description: This is a tabular dataset comprising of about 7000 instances, split into 5600 training and 1400 test instances. Each instance has about 12 mixed categorical and float features.
License: Creative Commons Attribution 4.0 International Creative Commons Attribution 4.0
This dataset was collected for use within the DARPA Data Driven Discovery of Models (D3M) program.
ID: 299_libras_move
Name: 299_libras_move_dataset
Description: LIBRAS Movement Database
ID: 299_libras_move_MIN_METADATA_dataset
Name: libras_move
Description: LIBRAS Movement Database
LIBRAS, acronym of the Portuguese name "LIngua BRAsileira de Sinais", is the official brazilian sign language. The dataset (movement_libras) contains 15 classes of 24 instances each, where each class references to a hand movement type in LIBRAS. The hand movement is represented as a bidimensional curve performed by the hand in a period of time. The curves were obtained from videos of hand movements, with the Libras performance from 4 different people, during 2 sessions. Each video corresponds to only one hand movement and has about $7$ seconds. Each video corresponds to a function F in a functions space which is the continual version of the input dataset. In the video pre-processing, a time normalization is carried out selecting 45 frames from each video, in according to an uniform distribution. In each frame, the centroid pixels of the segmented objects (the hand) are found, which compose the discrete version of the curve F with 45 points. All curves are normalized in the unitary space.
In order to prepare these movements to be analysed by algorithms, we have carried out a mapping operation, that is, each curve F is mapped in a representation with 90 features, with representing the coordinates of movement.
Each instance represents 45 points on a bi-dimensional space, which can be plotted in an ordered way (from 1 through 45 as the X coordinate) in order to draw the path of the movement.
Citation: @misc{Dua:2019 ,author = {Dua, Dheeru and Graff, Casey},year = {2017},title = {{UCI} Machine Learning Repository},url = {http://archive.ics.uci.edu/ml},institution = {University of California, Irvine, School of Information and Computer Sciences} }
Description: The Personae corpus was collected for experiments in Authorship Attribution and Personality Prediction. It consists of 145 Dutch-language essays, written by 145 different students.
License: CC Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
This dataset was collected for use within the DARPA Data Driven Discovery of Models (D3M) program.
ID: 313_spectrometer
Name: 313_spectrometer_dataset
Description: Part of the IRAS Low Resolution Spectrometer Database.
ID: 313_spectrometer_MIN_METADATA_dataset
Name: spectrometer
Description: Part of the IRAS Low Resolution Spectrometer Database.
The Infra-Red Astronomy Satellite (IRAS) was the first attempt to map the full sky at infra-red wavelengths. This could not be done from ground observatories because large portions of the infra-red spectrum is absorbed by the atmosphere. The primary observing program was the full high resolution sky mapping performed by scanning at 4 frequencies. The Low Resolution Observation (IRAS-LRS) program observed high intensity sources over two continuous spectral bands. This database derives from a subset of the higher quality LRS observations taken between 12h and 24h right ascension.
This database contains 531 high quality spectra derived from the IRAS-LRS database. The original data contained 100 spectral measurements in each of two overlapping bands. Of these, 44 blue band and 49 red band channels contain usable flux measurements. Only these are included here. The original spectral intensities values are compressed to 4-digits, and each spectrum includes 5 rescaling parameters. We have used the LRS specified algorithm to rescale these to units of spectral intensity (Janskys). Total intensity differences have been eliminated by normalizing each spectrum to a mean value of 5000.
This database was originally obtained for use in development and testing of our AutoClass system for Bayesian classification. We have not retained any results from this development, having concentrated our efforts of a 5425 element version of the same data. Our classifications were based upon simultaneous modeling of all 93 spectral intensities. With the larger database we were able to find classes that correspond well with known spectral types associated with particular stellar types. We also found classes that match with the spectra expected of certain stellar processes under investigation by Ames astronomers. These classes have considerably enlarged the set of stars being investigated by those researchers.
Citation: @misc{Dua:2019 ,author = {Dua, Dheeru and Graff, Casey},year = {2017},title = {{UCI} Machine Learning Repository},url = {http://archive.ics.uci.edu/ml},institution = {University of California, Irvine, School of Information and Computer Sciences} }
Citation: @inproceedings{Salamon:UrbanSound:ACMMM:14, Address = {Orlando, FL, USA}, Author = {Salamon, J. and Jacoby, C. and Bello, J. P.}, Booktitle = {22st {ACM} International Conference on Multimedia ({ACM-MM'14})}, Month = {Nov.}, Title = {A Dataset and Taxonomy for Urban Sound Research}, Year = {2014}}
Citation: @inproceedings{fma_dataset, title = {FMA: A Dataset for Music Analysis}, author = {Defferrard, Micha\"el and Benzi, Kirell and Vandergheynst, Pierre and Bresson, Xavier}, booktitle = {18th International Society for Music Information Retrieval Conference}, year = {2017}, url = {https://arxiv.org/abs/1612.01840},}
-----------------END------------------
\ No newline at end of file
ID: 32_fma_MIN_METADATA_dataset
Name: fma dataset
Description: The data consists small subset of raw music files. The audio files have different bit rate and length. All are mp3 format
License: Creative Commons Attribution 4.0 International License (CC BY 4.0)
Name: WikiQA: A Challenge Dataset for Open-Domain Question Answering
Description: WikiQA dataset is a publicly available set of question and sentence (QS) pairs, collected and annotated for research on open-domain question answering
License: MICROSOFT RESEARCH DATA LICENSE AGREEMENT MICROSOFT RESEARCH DATA LICENSE AGREEMENT
This dataset was collected for use within the DARPA Data Driven Discovery of Models (D3M) program.
ID: 4550_MiceProtein
Name: 4550_MiceProtein_dataset
Description: The data set consists of the expression levels of 77 proteins/protein modifications that produced detectable signals in the nuclear fraction of cortex. There are 38 control mice and 34 trisomic mice (Down syndrome), for a total of 72 mice. In the experiments, 15 measurements were registered of each protein per sample/mouse. Therefore, for control mice, there are 38x15, or 570 measurements, and for trisomic mice, there are 34x15, or 510 measurements. The dataset contains a total of 1080 measurements per protein. Each measurement can be considered as an independent sample/mouse.
ID: 4550_MiceProtein_MIN_METADATA_dataset
Name: MiceProtein
Description: The data set consists of the expression levels of 77 proteins/protein modifications that produced detectable signals in the nuclear fraction of cortex. There are 38 control mice and 34 trisomic mice (Down syndrome), for a total of 72 mice. In the experiments, 15 measurements were registered of each protein per sample/mouse. Therefore, for control mice, there are 38x15, or 570 measurements, and for trisomic mice, there are 34x15, or 510 measurements. The dataset contains a total of 1080 measurements per protein. Each measurement can be considered as an independent sample/mouse.
The eight classes of mice are described based on features such as genotype, behavior and treatment. According to genotype, mice can be control or trisomic. According to behavior, some mice have been stimulated to learn (context-shock) and others have not (shock-context) and in order to assess the effect of the drug memantine in recovering the ability to learn in trisomic mice, some mice have been injected with the drug and others have not.
...
...
@@ -20,11 +17,21 @@ Classes:
```
The aim is to identify subsets of proteins that are discriminant between the classes.