NDLI: CSE 599 i : Online and Adaptive Machine Learning Winter 2018 Lecture 14 : Binary Classification : Disagreement-based Methods

Please wait, while we are loading the content...

CSE 599 i : Online and Adaptive Machine Learning Winter 2018 Lecture 14 : Binary Classification : Disagreement-based Methods

Content Provider	Semantic Scholar
Author	Jamieson, Kevin Cano, Nieves Rafieian, Omid Barzegary, E. Erion, Gabriel G. Fatehi, Soraya Ehsani, Kiana
Copyright Year	2018
Abstract	Binary classification has been one of the most prevalent problems in machine learning, as it relates to a wide range of empirical problems (e.g., whether or not a user clicks on an ad, likes a photo on social networks, etc). In binary classification, we are given a set of n examples {(xi, yi)}i=1, where xi ∈ R is the feature vector of the i example and yi ∈ {−1, 1} is its label, distributed i.i.d. from some distribution DXY , and the goal of the classifier is to infer the label of unseen examples. Another important property of a classifier is how fast it learns. That is, for a given hypothesis space, we want to see how fast the error goes down as a function of the number of observations n. In this lecture, we focus on the problem of binary classification and examine two approaches to tackle the problem: passive and active learning. In passive learning, the goal of the learner is to infer an accurate predictor from labeled training data. The labeled training data are examples of input-output pairs (x, y), where the label y represents the outcome associated with the input x. In this setting, the supervised algorithm would obtain the labels for a random subset of inputs and learn the classifier. In many situations, however, obtaining labels can be costly and time consuming. For example, in speech recognition, the speech signal is cheap to obtain but labeling would require a lot of time and labor force. This gives rise to the notion of active learning, which instead, considers situation wherein we are given a set of unlabeled data points, (i.e., each training example is an input x without an associated label y), and the learner is allowed to request the label y of any particular input x. The goal of the active learner is to infer an accurate classifier of labels from inputs without making too many queries. An active learner tries to get the most out of a limited budget by choosing its query points (points for which it asks for the label) in an intelligent and adaptive manner [1]. In this lecture, we start with the problem of passive learning in a realizable setting, where we assume a perfect classifier exists and the data is separable. We examine how fast the error goes down as we increase the sample size. We then consider a particular active learning setting wherein unlabeled data come as a stream, and for each incoming data point, the learner needs to decide whether to query its label or not. We introduce an active learning algorithm proposed by Cohn, Atlas, and Ladner (CAL henceforth) [2] and see how much we can improve the convergence rate as compared with the passive one. CAL is an example of a broader set of algorithms called disagreement-based methods. The main idea behind these methods is that the active learner maintains a candidate set of hypotheses (often called a version space), and queries the label of a data point only if there is disagreement within this set on how to label the point. As such, each time a new label is seen, the current set of hypothesis that are still “in the running” shrinks. Figure 1, taken from Dasgupta [1], shows an example of a disagreement-based method (CAL) under the assumption that the data is linearly separable. As shown in Figure 1a, seven points were labeled and a new point arrives. Figure 1b shows some of the hypotheses in the current version space, indicating that all these lines are consistent with the labeled data seen so far. Combining all the lines in the version space, we can form the region of disagreement, as shown in Figure 1c. Since the new point does not lie in the disagreement region, the learner is able to infer its label. Finally, we discuss the agnostic case, where a perfect separator does not exist and the best classifier has some noise. We introduce a disagreement-based algorithm proposed by Dasgupta, Hsu, and Monteleoni
File Format	PDF HTM / HTML
Alternate Webpage(s)	https://courses.cs.washington.edu/courses/cse599i/18wi/resources/lecture14/lecture14.pdf
Language	English
Access Restriction	Open
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in