Loading...
Please wait, while we are loading the content...
Similar Documents
Learning from Data Streams with Concept Drift
| Content Provider | Semantic Scholar |
|---|---|
| Author | Garnett, Roman Roberts, Stephen J. |
| Copyright Year | 2008 |
| Abstract | Increasing access to incredibly large, nonstationary datasets and corresponding demands to analyse these data has led to the development of new online algorithms for performing machine learning on data streams. An important feature of real-world data streams is " concept drift, " whereby the distributions underlying the data can change arbitrarily over time. The presence of concept drift in a data stream causes many classical data mining techniques to become unsuitable, and therefore new approaches must be devloped in their place. In pursuit of this goal, we introduce the dynamic logistic regressor (DLR), a sequential Bayesian approach for performing binary classification on nonstationary data streams. We proceed to show how the DLR framework can be extended to cope with missing observations and missing and corrupted labels. We proceed to describe a new meta-algorithm for performing classification and regression on data streams with concept drift. The convex hull of receiver operating characteristic (ROC) curves has long been used for identifying potentially optimal classifiers. Unfortunately, the ROC curve does not perform as expected when learning from data streams exhibiting concept drift. We introduce a modification to the ROC curve that provides an easily maintainable online summary of a classifier's performance, even in the presence of concept drift. We similarly modify the recently introduced regression error characteristic (REC) curve, giving analogous dynamic summaries of online regressors. We then introduce a system for online ensemble learning utilizing these dynamic performance curves. Using the convex hulls of these curves, we develop a simple framework for supervised learning with drifting data streams. We present empirical evidence with real and simulated data that demonstrates that the proposed method performs better than selected previous solutions. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.cse.wustl.edu/~garnett/files/papers/garnett_thesis.pdf |
| Alternate Webpage(s) | https://www.cse.wustl.edu/~garnett/files/Garnett%20CV.pdf |
| Alternate Webpage(s) | http://www.robots.ox.ac.uk/~parg/pubs/TR-PARG-08-01.pdf |
| Alternate Webpage(s) | http://www.robots.ox.ac.uk/~parg/pubs/theses/RomanGarnett_thesis.pdf |
| Alternate Webpage(s) | http://www-kd.iai.uni-bonn.de/pubattachments/686/garnett_thesis.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |