NDLI: SOGOU-2012-CRAWL: A Crawl of Search Results in the Sogou 2012 Chinese Query Log

Please wait, while we are loading the content...

Understanding Human Language: Can NLP and Deep Learning Help?

Big Data in Climate: Opportunities and Challenges for Machine Learning

Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015

Searching by Talking: Analysis of Voice Queries on Mobile Web Search

Document Retrieval Using Entity-Based Language Models

Generalized BROOF-L2R: A General Framework for Learning to Rank Based on Boosting and Random Forests

On Effective Personalized Music Retrieval by Exploring Online User Behaviors

Explainable User Clustering in Short Text Streams

Learning Query and Document Relevance from a Web-scale Click Graph

Novelty based Ranking of Human Answers for Community Questions

Transfer Learning for Cross-Lingual Sentiment Classification with Weakly Shared Deep Neural Networks

Leveraging Context-Free Grammar for Efficient Inverted Index Compression

Learning to Rank Features for Recommendation over Multiple Categories

Understanding Information Need: An fMRI Study

R-Susceptibility: An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities

Modeling Document Novelty with Neural Tensor Network for Search Result Diversification

Robust and Collective Entity Disambiguation through Semantic Embeddings

When Watson Went to Work: Leveraging Cognitive Computing in the Real World

When does Relevance Mean Usefulness and User Satisfaction in Web Search?

Event Digest: A Holistic View on Past Events

Building a Self-Learning Search Engine: From Research to Business

Contextual Bandits in a Collaborative Environment

Leveraging User Interaction Signals for Web Image Search

Principles for the Design of Online A/B Metrics

Predicting User Engagement with Direct Displays Using Mouse Cursor Information

A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation

A Complete & Comprehensive Movie Review Dataset (CCMR)

A Dynamic Recurrent Model for Next Basket Recommendation

A Platform for Streaming Push Notifications to Mobile Assessors

A Novel Approach to Define and Model Contextual Features in Recommender Systems

Collaborative Information Seeking: Art and Science of Achieving 1+1>2 in IR

Third International Workshop on Gamification for Information Retrieval (GamifIR'16)

Bayesian Performance Comparison of Text Classifiers

Predicting User Satisfaction with Intelligent Assistants

Engineering Quality and Reliability in Technology-Assisted Review

An Optimization Framework for Remapping and Reweighting Noisy Relevance Labels

Semantification of Identifiers in Mathematics for Better Math Information Retrieval

Topic Modeling for Short Texts with Auxiliary Word Embeddings

Click-based Hot Fixes for Underperforming Torso Queries

That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search

Query to Knowledge: Unsupervised Entity Extraction from Shopping Queries using Adaptor Grammars

Fast and Compact Hamming Distance Index

How Much Novelty is Relevant?: It Depends on Your Curiosity

User Behavior in Asynchronous Slow Search

Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising

ScentBar: A Query Suggestion Interface Visualizing the Amount of Missed Relevant Information for Intrinsically Diverse Search

Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from Knowledge Graph

Ask Your TV: Real-Time Question Answering with Recurrent Neural Networks

How Many Workers to Ask?: Adaptive Exploration for Collecting High Quality Labels

Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events

Sedano: A News Stream Processor for Business

Collaborative Filtering Bandits

Self-Paced Cross-Modal Subspace Matching

Visual Recommendation Use Case for an Online Marketplace Platform: allegro.pl

Search Result Prefetching Using Cursor Movement

Improved Caching Techniques for Large-Scale Image Hosting Services

A Cross-Platform Collection of Social Network Profiles

A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling

A Visual Analytics Approach for What-If Analysis of Information Retrieval Systems

A Study of Information Seeking Behavior Using Physical and Online Explorations

Constructing and Mining Web-scale Knowledge Graphs

HIA'16: The 2nd International Workshop on Heterogeneous Information Access at SIGIR 2016

A General Linear Mixed Models Approach to Study System Component Effects

Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System

A Sequential Decision Formulation of the Interface Card Model for Interactive IR

Learning to Rank with Selection Bias in Personal Search

Multi-Stage Math Formula Search: Using Appearance-Based Similarity Metrics at Scale

Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams

A Context-aware Time Model for Web Search

When a Knowledge Base Is Not Enough: Question Answering over Knowledge Bases with External Text Data

Learning for Efficient Supervised Query Expansion via Two-stage Feature Selection

Fast First-Phase Candidate Generation for Cascading Rankers

Discrete Collaborative Filtering

Going back in Time: An Investigation of Social Media Re-finding

Retrieving Non-Redundant Questions to Summarize a Product Review

Evaluating Search Result Diversity using Intent Hierarchies

Hierarchical Random Walk Inference in Knowledge Graphs

Amazon Search: The Joy of Ranking Products

Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines

GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams

Ranking Financial Tweets

Fast Matrix Factorization for Online Recommendation with Implicit Feedback

Composite Correlation Quantization for Efficient Multimodal Retrieval

AOL's Named Entity Resolver: Solving Disambiguation via Document Strongly Connected Components and Ad-Hoc Edges Construction

Predicting Search User Examination with Visual Saliency

A Test Collection for Matching Patients to Clinical Trials

An Empirical Study of Learning to Rank for Entity Search

An Architecture for Privacy-Preserving and Replicable High-Recall Retrieval Experiments

Appearance-Based Retrieval of Mathematical Notation in Documents and Lecture Videos

Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement

Medical Information Search Workshop (MEDIR)

Learning to Rank Personalized Search Results in Professional Networks

The Data Stack in Information Retrieval

ArabicWeb16: A New Crawl for Today's Arabic Web

An Exploration of Evaluation Metrics for Mobile Push Notifications

Analysing Temporal Evolution of Interlingual Wikipedia Article Pairs

Beyond Topical Relevance: Studying Understandability and Reliability in Consumer Health Search

Deep Learning for Information Retrieval

Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval

Building Test Collections for Evaluating Temporal IR

An Improved Multileaving Algorithm for Online Ranker Evaluation

Cobwebs from the Past and Present: Extracting Large Social Networks using Internet Archive Data

Enhancing Information Retrieval with Adapted Word Embedding

From Design to Analysis: Conducting Controlled Laboratory Experiments with Users

Privacy-Preserving IR 2016: Differential Privacy, Search, and Social Media

DAJEE: A Dataset of Joint Educational Entities for Information Retrieval in Technology Enhanced Learning

An Unsupervised Approach to Anomaly Detection in Music Datasets

Context-Sensitive Auto-Completion for Searching with Entities and Categories

Fairness in Information Retrieval

Instant Search: A Hands-on Tutorial

Search as Learning (SAL) Workshop 2016

Evaluating Retrieval over Sessions: The TREC Session Track 2011-2014

Anonymizing Query Logs by Differential Privacy

EAIMS: Emergency Analysis Identification and Management System

Going Beyond Relevance: Incorporating Effort in Information Retrieval

Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial

SIGIR 2016 Workshop WebQA II: Web Question Answering Beyond Factoids

EveTAR: A New Test Collection for Event Detection in Arabic Tweets

Audio Features Affected by Music Expressiveness: Experimental Setup and Preliminary Results on Tuba Players

Expedition: A Time-Aware Exploratory Search System Designed for Scholars

Measuring Interestingness of Political Documents

Question Answering with Knowledge Base, Web and Beyond

GNMID14: A Collection of 110 Million Global Music Identification Matches

Automatic Identification and Contextual Reformulation of Implicit System-Related Queries

iGlasses: A Novel Recommendation System for Best-fit Glasses

Modeling User Feedback in Dynamic Search and Browsing

Scalability and Efficiency Challenges in Large-Scale Web Search Engines

Longitudinal Navigation Log Data on a Large Web Domain

Axiomatic Analysis for Improving the Log-Logistic Feedback Model

InfoScout: An Interactive, Entity Centric, Person Search Tool

Modelling User Search Behaviour Based on Process

Simulation of Interaction: A Tutorial on Modelling and Simulating User Interaction and Search Behaviour

New Collection Announcement: Focused Retrieval Over the Web

Balancing Relevance Criteria through Multi-Objective Optimization

InLook: Revisiting Email Search Experience

Retrievability: An Independent Evaluation Measure

Succinct Data Structures in Information Retrieval: Theory and Practice

NTCIR Lifelog: The First Test Collection for Lifelog Research

Build Emotion Lexicon from the Mood of Crowd via Topic-Assisted Joint Non-negative Matrix Factorization

Interacting with Financial Data using Natural Language

Significant Words Representations of Entities

Temporal Information Retrieval

SOGOU-2012-CRAWL: A Crawl of Search Results in the Sogou 2012 Chinese Query Log

Burst Detection in Social Media Streams for Tracking Interest Profiles in Real Time

LONLIES: Estimating Property Values for Long Tail Entities

Time-Quality Trade-offs in Search

The BOLT IR Test Collections of Multilingual Passage Retrieval from Discussion Forums

Cluster-based Joint Matrix Factorization Hashing for Cross-Modal Retrieval

Personalised News and Blog Recommendations based on User Location, Facebook and Twitter User Profiling

Torii: Attribute-based Polarity Analysis with Big Datasets

The Factoid Queries Collection

Collaborative Ranking with Social Relationships for Top-N Recommendations

PULP: A System for Exploratory Search of Scientific Literature

User Interaction in Mobile Web Search

The LExR Collection for Expertise Retrieval in Academia

Community-based Cyberreading for Information Understanding

SECC: A Novel Search Engine Interface with Live Chat Channel

UQV100: A Test Collection with Query Variability

Computational Creativity Based Video Recommendation

Simulating Interactive Information Retrieval: SimIIR: A Framework for the Simulation of Interaction

Controversy Detection in Wikipedia Using Collective Classification

The ComeWithMe System for Searching and Ranking Activity-Based Carpooling Rides

Discovering Author Interest Evolution in Topic Modeling

ThingSeek: A Crawler and Search Engine for the Internet of Things

Distributional Random Oversampling for Imbalanced Text Classification

Tweetviz: Visualizing Tweets for Business Intelligence

Doc2Sent2Vec: A Novel Two-Phase Approach for Learning Document Representation

Where the Event Lies: Predicting Event Occurrence in Textual Documents

Dynamically Integrating Item Exposure with Rating Prediction in Collaborative Filtering

Effective Trend Detection within a Dynamic Search Context

Enhancing First Story Detection using Word Embeddings

Examining the Coherence of the Top Ranked Tweet Topics

Explicit In Situ User Feedback for Web Search Results

Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles

Exploiting Semantic Coherence Features for Information Retrieval

Extracting Information Seeking Intentions for Web Search Sessions

First Story Detection using Multiple Nearest Neighbors

Health Monitoring on Social Media over Time

How Informative is a Term?: Dispersion as a measure of Term Specificity

Identifying Careless Workers in Crowdsourcing Platforms: A Game Theory Approach

Impact of Review-Set Selection on Human Assessment for Text Classification

Improving Automated Controversy Detection on the Web

Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval

Improving Retrieval Quality Using Pseudo Relevance Feedback in Content-Based Image Retrieval

Ingrams: A Neuropsychological Explanation For Why People Search

Investment Recommendation using Investor Opinions in Social Media

"Is Sven Seven?": A Search Intent Module for Children

Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search

Jointly Modeling Review Content and Aspect Ratings for Review Rating Prediction

Learning to Project and Binarise for Hashing Based Approximate Nearest Neighbour Search

Linking Organizational Social Network Profiles

Load-Balancing in Distributed Selective Search

Multi-Rate Deep Learning for Temporal Recommendation

Network-Aware Recommendations of Novel Tweets

Not All Links Are Created Equal: An Adaptive Embedding Approach for Social Personalized Ranking

On a Topic Model for Sentences

On Information-Theoretic Document-Person Associations for Expert Search in Academia

On the Applicability of Delicious for Temporal Search on Web Archives

On the Effectiveness of Contextualisation Techniques in Spoken Query Spoken Content Retrieval

Ordinal Text Quantification

Pearson Rank: A Head-Weighted Gap-Sensitive Score-Based Correlation Coefficient

Polarized User and Topic Tracking in Twitter

Post-Learning Optimization of Tree Ensembles for Efficient Ranking

Quit While Ahead: Evaluating Truncated Rankings

Quote Recommendation in Dialogue using Deep Neural Network

Ranking Documents Through Stochastic Sampling on Bayesian Network-based Models: A Pilot Study

Ranking Health Web Pages with Relevance and Understandability

Rethinking the Cost of Information Search Behavior

Retrievability of Code Mixed Microblogs

Retweeting Behavior Prediction Based on One-Class Collaborative Filtering in Social Networks

Sampling Strategies and Active Learning for Volume Estimation

Search-based Evaluation from Truth Transcripts for Voice Search Applications

Seeking Serendipity: A Living Lab Approach to Understanding Creative Retrieval in Broadcast Media Production

Selectively Personalizing Query Auto-Completion

SG++: Word Representation with Sentiment and Negation for Twitter Sentiment Classification

SGT Framework: Social, Geographical and Temporal Relevance for Recreational Queries in Web Search

SimCC-AT: A Method to Compute Similarity of Scientific Papers with Automatic Parameter Tuning

Simple Dynamic Emission Strategies for Microblog Filtering

Subspace Clustering Based Tag Sharing for Inductive Tag Matrix Refinement with Complex Errors

Temporal Query Intent Disambiguation using Time-Series Data

To Blend or Not to Blend?: Perceptual Speed, Visual Memory and Aggregated Search

Topic Model based Privacy Protection in Personalized Web Search

Topic Quality Metrics Based on Distributed Word Representations

Toward Estimating the Rank Correlation between the Test Collection Results and the True System Performance

Tracking Sentiment by Time Series Analysis

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

Two Sample T-tests for IR Evaluation: Student or Welch?

Uncovering Task Based Behavioral Heterogeneities in Online Search Behavior

Understanding Website Behavior based on User Agent

Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data

Utilizing Focused Relevance Feedback

What Makes a Query Temporally Sensitive?

Which Information Sources are More Effective and Reliable in Video Search

Why do you Think this Query is Difficult?: A User Study on Human Query Prediction

SOGOU-2012-CRAWL: A Crawl of Search Results in the Sogou 2012 Chinese Query Log

Content Provider	ACM Digital Library
Author	Whiting, Stewart Alonso, Omar Jose, Joemon M.
Abstract	In 2012, Sogou, a major Chinese web search engine released a large-scale query log containing 43.5M user interactions, including submitted queries and clicked web page search results. This query log offers a deep sample of queries over a two day period from 30th December 2011 to 1st January 2012. In August 2013, we identified 1.4M predominantly Chinese language unique search result URLs that were clicked at least three times in this query log. We crawled the HTML content of these URLs to construct the supplementary SOGOU-2012-CRAWL dataset, which we release in this work. A real large-scale query log with accompanying crawl such as this offers several opportunities for reproducible information retrieval (IR) research, including query classification, intent modelling and indexing strategy. In this paper we first detail the query log and crawl dataset construction and characteristics. Following this, to demonstrate potential applications we use the crawl to indicatively analyse various time-based patterns in web content and search behaviour. In particular, we study the distribution of language-independent date expressions in the crawled web content. Based on this, we propose a simple approach for modelling the past/present/future temporal intent of queries based on the date the query was submitted by the user, and the dates appearing in the clicked search results. We observe several prominent temporal patterns which may lead to novel time-aware IR approaches.
Starting Page	709
Ending Page	712
Page Count	4
File Format	PDF
ISBN	9781450340694
DOI	10.1145/2911451.2914668
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2016-07-07
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Crawl Temporal Chinese Intent Time
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in