A New Approach to Quantifying and Comparing Data Sets

This page is maintaned by Gürol Canbek for the manuscript submitted to Journal of Machine Learning Research with the title above.

dsdist.R

R script to test various statistical distributions' fit into the existing feature frequency distributions (i.e. truth).

The data sets

Total 11 data sets for Android benign/malign application permission frequency distributions are provided.

Data Set Author(s) Negative Positive
DS0 (Lindorfer et al., 2014) X X
DS1 (Aswini and Vinod, 2014) X X
DS2 (Wang et al., 2014) X X
DS3 (Yerima et al., 2014) X X
DS4 (Jiang and Zhou, 2013) None X
DS5 (Peng et al., 2012) X X

Analysis Sheet

A OpenOfice Spreadsheet Document (best viewed with LibreOffice) will be provided after publication of the manuscript to see the values, charts, and extra information related to the study.

You can access all the materials and detailed information at https://github.com/gurol/dsanalysis


Note: Please, cite our academic study if you would like to use the code, datasets, methodology, and other materials provided and let us know. Thank you for your interest.

    Gürol Canbek, Seref Sagiroglu, and Tugba Taskaya Temizel. A New Approach to Quantifying and Comparing Data Sets. In Journal of Machine Learning Research (JMLR), Submitted, 2018.

Contact (to receive update notification): Gürol Canbek

Version 1.1, Last update on January 23th, 2018

References for the Data Sets