NDLI: Parallel boosted regression trees for web search ranking

Please wait, while we are loading the content...

How can scientists help to spread the web to all sections of the society

Sparse hidden-dynamics conditional random fields for user intent understanding

A unified framework for recommending diverse and relevant queries

Model characterization curves for federated search using click-logs: predicting user engagement metrics for the span of feasible operating points

Query segmentation revisited

An expressive mechanism for auctions on the web

Here, there, and everywhere: correlated online behaviors can lead to overestimates of the effects of advertising

ARROW: GenerAting SignatuRes to Detect DRive-By DOWnloads

Semi-supervised truth discovery

Geographical topic discovery and comparison

Learning to re-rank: query-dependent image re-ranking using click data

Pay as you browse: microcomputations as micropayments in web-based services

A word at a time: computing word relatedness using temporal semantic analysis

Learning to rank with multiple objective functions

Evaluating new search engine configurations with pre-existing judgments and clicks

SEISA: set expansion by iterative similarity aggregation

Track globally, deliver locally: improving content delivery networks by tracking geographic social cascades

Inverted index compression via online document routing

We know who you followed last summer: inferring social link creation times in twitter

Dynamics of bidding in a P2P lending service: effects of herding and predicting loan success

Efficient k-nearest neighbor graph construction for generic similarity measures

Counting triangles and the curse of the last reducer

EP-SPARQL: a unified language for event processing and stream reasoning

Limiting the spread of misinformation in social networks

Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter

Milgram-routing in social networks

Supporting synchronous social q&a throughout the question lifecycle

A distributed framework for reliable and efficient service choreographies

Designing the web for an open society

Characterizing search intent diversity into click models

Improving recommendation for long-tail queries via templates

Generalized link suggestions via web site clustering

Context-sensitive query auto-completion

Incentivizing high-quality user-generated content

Adaptive policies for selecting groupon style chunked reward ads in a stochastic knapsack framework

Prophiler: a fast filter for the large-scale detection of malicious web pages

SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement

The web of topics: discovering the topology of topic evolution in a corpus

Video summarization via transferrable structured learning

Consideration set generation in commerce search

Automatic construction of a context-aware sentiment lexicon: an optimization approach

A stochastic learning-to-rank algorithm and its application to contextual advertising

On the informativeness of cascade and intent-aware effectiveness measures

Highly efficient algorithms for structural clustering of large websites

Measuring a commercial content delivery network

Efficiently evaluating graph constraints in content-based publish/subscribe

Modeling the temporal dynamics of social rating networks using bidirectional effects of social relations and rating patterns

Finding hierarchy in directed online social networks

Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks

Network bucket testing

A better uncle for OWL: nominal schemas for integrating rules and ontologies

Information credibility on twitter

Who says what to whom on twitter

Information spreading in context

The design and usage of tentative events for time-based social coordination in the enterprise

Choreography conformance via synchronizability

Games, algorithms, and the Internet

Addressing people's information needs directly in a web search result page

Learning to model relatedness for news recommendation

A self-training approach for resolving object coreference on the semantic web

Online spelling correction for query completion

Buy-it-now or take-a-chance: a simple sequential screening mechanism

A game theoretic formulation of the service provisioning problem in cloud systems

Heat-seeking honeypots: design and experience

Search result diversity for informational queries

Unified analysis of streaming news

Towards semantic knowledge propagation from text corpus to web images

Towards a theory model for product search

Web scale NLP: a case study on url word breaking

Parallel boosted regression trees for web search ranking

Pragmatic evaluation of folksonomies

SCAD: collective discovery of attribute values

Turkalytics: analytics for human computation

FACTO: a fact lookup engine based on web tables

Like like alike: joint friendship and interest propagation in social networks

Finding the bias and prestige of nodes in networks based on trust scores

Estimating sizes of social networks via biased sampling

HyperANF: approximating the neighbourhood function of very large graphs on a budget

Rewriting queries on SPARQL views

SafeVchat: detecting obscene content and misbehaving users in online video chat services

we.b: the web of short urls

Mark my words!: linguistic style accommodation in social media

A case for query by image and text content: searching computer help using screenshots and keywords

Statically locating web application bugs caused by asynchronous calls

Parallel boosted regression trees for web search ranking

Content Provider	ACM Digital Library
Author	Agrawal, Kunal Paykin, Jennifer Tyree, Stephen Weinberger, Kilian Q.
Abstract	Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned web-search ranking - a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the construction of the individual regression trees and operates using the master-worker paradigm as follows. The data are partitioned among the workers. At each iteration, the worker summarizes its data-partition using histograms. The master processor uses these to build one layer of a regression tree, and then sends this layer to the workers, allowing the workers to build histograms for the next layer. Our algorithm carefully orchestrates overlap between communication and computation to achieve good performance. Since this approach is based on data partitioning, and requires a small amount of communication, it generalizes to distributed and shared memory machines, as well as clouds. We present experimental results on both shared memory machines and clusters for two large scale web search ranking data sets. We demonstrate that the loss in accuracy induced due to the histogram approximation in the regression tree creation can be compensated for through slightly deeper trees. As a result, we see no significant loss in accuracy on the Yahoo data sets and a very small reduction in accuracy for the Microsoft LETOR data. In addition, on shared memory machines, we obtain almost perfect linear speed-up with up to about 48 cores on the large data sets. On distributed memory machines, we get a speedup of 25 with 32 processors. Due to data partitioning our approach can scale to even larger data sets, on which one can reasonably expect even higher speedups.
Starting Page	387
Ending Page	396
Page Count	10
File Format	PDF
ISBN	9781450306324
DOI	10.1145/1963405.1963461
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2011-03-28
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Parallel computing Ranking Machine learning Boosting Boosted regression trees Distributed computing Web search
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in