NDLI: Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

Content Provider	Springer Nature Link
Author	Fong, Simon Liu, Kexing Cho, Kyungeun Wong, Raymond Mohammed, Sabah Fiaidhi, Jinan
Copyright Year	2016
Abstract	Big data stream is a new hype but a practical computational challenge founded on data streams that are prevalent in applications nowadays. It is quite well known that data streams that are originated and collected from monitoring sensors accumulate continuously to a very huge amount making traditional batch-based model induction algorithms infeasible for real-time data mining or just-in-time data analytics. In this position paper, following a new data stream mining methodology, namely stream-based holistic analytics and reasoning in parallel (SHARP), a list of data analytic challenges as well as improvised methods are looked into. In particular, two types of decision tree algorithms, batch-mode and incremental-mode, are put under test at sensor data that represents a typical big data stream. We investigate whether and to what extent of two improvised methods—outlier removal and balancing imbalanced class distributions—affect the prediction performance in big data stream mining. SHARP is founded on incremental learning which does not require all the training to be loaded into the memory. This important fundamental concept needs to be supported not only by the decision tree algorithms, but by the other improvised methods usually at the preprocessing stage as well. This paper sheds some light into this area which is often overlooked by data analysts when it comes to big data stream mining.
Starting Page	3927
Ending Page	3959
Page Count	33
File Format	PDF
ISSN	09208542
Journal	The Journal of Supercomputing
Volume Number	72
Issue Number	10
e-ISSN	15730484
Language	English
Publisher	Springer US
Publisher Date	2016-02-16
Publisher Place	New York
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Data stream mining Big data Very fast decision tree Resampling Sensor data Programming Languages, Compilers, Interpreters Processor Architectures Computer Science
Content Type	Text
Resource Type	Article
Subject	Theoretical Computer Science Information Systems Hardware and Architecture Software

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Handling big data: research challenges and future directions

Fast kernel feature ranking using class separability for big data mining

A general perspective of Big Data: applications, tools, challenges and trends

StreamPI: a stream-parallel programming extension for object-oriented programming languages

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

Big data pre-processing methods with vehicle driving data using MapReduce techniques

Big data applications for healthcare: preface to special issue

Analysis of tree-based uncertain frequent pattern mining techniques without pattern losses

Location-based big data analytics for guessing the next Foursquare check-ins

Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

Similar Documents

Handling big data: research challenges and future directions

Fast kernel feature ranking using class separability for big data mining

A general perspective of Big Data: applications, tools, challenges and trends

StreamPI: a stream-parallel programming extension for object-oriented programming languages

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

Big data pre-processing methods with vehicle driving data using MapReduce techniques

Big data applications for healthcare: preface to special issue

Analysis of tree-based uncertain frequent pattern mining techniques without pattern losses

Location-based big data analytics for guessing the next Foursquare check-ins

Improvised methods for tackling big data stream mining challenges: case study of human activity recognition