Loading...
Please wait, while we are loading the content...
Similar Documents
Finding Critical Samples for Mining Big Data
| Content Provider | Semantic Scholar |
|---|---|
| Author | Sung, Andrew H. Ribeiro, Bernardete M. Liu, Qingzhong Suryakumar, Divya |
| Copyright Year | 2014 |
| Abstract | To ensure success of big data analytics, effective data mining methods are essential; and in mining big data two of the most important problems are sampling and feature selection. Proper sampling combined with good feature selection can contribute to significant reductions of the datasets while obtaining satisfactory results in model building or knowledge discovery. The critical sampling size problem concerns whether, for a given dataset, there is a minimum number of data points that must be included in any sampling for a learning machine to achieve satisfactory performance. In this paper, the critical sampling problem is analyzed and shown to be intractablein fact, its theoretical formulation and proof of intractability immediately follow that of the previously studied critical feature dimension problem. Next, heuristic methods for finding critical sampling of datasets are proposed, as it is expected that heuristic methods will be practically useful for sampling in big data analytic tasks . |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://worldcomp-proceedings.com/proc/p2014/ABD7204.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |