Loading...
Please wait, while we are loading the content...
Similar Documents
CSE 599 i : Online and Adaptive Machine Learning Winter 2018 Lecture 14 : Binary Classification : Disagreement-based Methods
| Content Provider | Semantic Scholar |
|---|---|
| Author | Jamieson, Kevin Cano, Nieves Rafieian, Omid Barzegary, E. Erion, Gabriel G. Fatehi, Soraya Ehsani, Kiana |
| Copyright Year | 2018 |
| Abstract | Binary classification has been one of the most prevalent problems in machine learning, as it relates to a wide range of empirical problems (e.g., whether or not a user clicks on an ad, likes a photo on social networks, etc). In binary classification, we are given a set of n examples {(xi, yi)}i=1, where xi ∈ R is the feature vector of the i example and yi ∈ {−1, 1} is its label, distributed i.i.d. from some distribution DXY , and the goal of the classifier is to infer the label of unseen examples. Another important property of a classifier is how fast it learns. That is, for a given hypothesis space, we want to see how fast the error goes down as a function of the number of observations n. In this lecture, we focus on the problem of binary classification and examine two approaches to tackle the problem: passive and active learning. In passive learning, the goal of the learner is to infer an accurate predictor from labeled training data. The labeled training data are examples of input-output pairs (x, y), where the label y represents the outcome associated with the input x. In this setting, the supervised algorithm would obtain the labels for a random subset of inputs and learn the classifier. In many situations, however, obtaining labels can be costly and time consuming. For example, in speech recognition, the speech signal is cheap to obtain but labeling would require a lot of time and labor force. This gives rise to the notion of active learning, which instead, considers situation wherein we are given a set of unlabeled data points, (i.e., each training example is an input x without an associated label y), and the learner is allowed to request the label y of any particular input x. The goal of the active learner is to infer an accurate classifier of labels from inputs without making too many queries. An active learner tries to get the most out of a limited budget by choosing its query points (points for which it asks for the label) in an intelligent and adaptive manner [1]. In this lecture, we start with the problem of passive learning in a realizable setting, where we assume a perfect classifier exists and the data is separable. We examine how fast the error goes down as we increase the sample size. We then consider a particular active learning setting wherein unlabeled data come as a stream, and for each incoming data point, the learner needs to decide whether to query its label or not. We introduce an active learning algorithm proposed by Cohn, Atlas, and Ladner (CAL henceforth) [2] and see how much we can improve the convergence rate as compared with the passive one. CAL is an example of a broader set of algorithms called disagreement-based methods. The main idea behind these methods is that the active learner maintains a candidate set of hypotheses (often called a version space), and queries the label of a data point only if there is disagreement within this set on how to label the point. As such, each time a new label is seen, the current set of hypothesis that are still “in the running” shrinks. Figure 1, taken from Dasgupta [1], shows an example of a disagreement-based method (CAL) under the assumption that the data is linearly separable. As shown in Figure 1a, seven points were labeled and a new point arrives. Figure 1b shows some of the hypotheses in the current version space, indicating that all these lines are consistent with the labeled data seen so far. Combining all the lines in the version space, we can form the region of disagreement, as shown in Figure 1c. Since the new point does not lie in the disagreement region, the learner is able to infer its label. Finally, we discuss the agnostic case, where a perfect separator does not exist and the best classifier has some noise. We introduce a disagreement-based algorithm proposed by Dasgupta, Hsu, and Monteleoni |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://courses.cs.washington.edu/courses/cse599i/18wi/resources/lecture14/lecture14.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |