NDLI: Towards scalable summarization and visualization of large text corpora (abstract only)

Please wait, while we are loading the content...

Edgar F. Codd Innovations Award Talk

Calvin: fast distributed transactions for partitioned database systems

Parallel main-memory indexing for moving-object query and update workloads

Sample-driven schema mapping

Interactive regret minimization

Managing large dynamic graphs efficiently

Skimmer: rapid scrolling of relational query results

bLSM: a general purpose log structured merge tree

High-performance complex event processing over XML streams

MaskIt: privately releasing user context streams for personalized mobile applications

Towards a unified architecture for in-RDBMS analytics

CrowdScreen: algorithms for filtering data with humans

Processing a large number of continuous preference top-k queries

Temporal alignment

Aggregate suppression for enterprise search engines

A model-based approach to attributed graph clustering

Locality-sensitive hashing scheme based on dynamic collision counting

Analytic database technologies for a new kind of user: the data enthusiast

Mob data sourcing

Automatic web-scale information extraction

Sindbad: a location-based social networking system

Shark: fast data analysis using coarse-grained distributed memory

Amazon dynamoDB: a seamlessly scalable non-relational database service

The value of social media data in enterprise applications

Query optimization in microsoft SQL server PDW

TAO: how facebook serves the social graph

Dynamic workload driven data integration in tableau

CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster

Declarative web application development: encapsulating dynamic JavaScript widgets (abstract only)

SIGMOD Contributions Award Talk

Advanced partitioning techniques for massively distributed computation

Divergent physical design tuning for replicated databases

Can we beat the prefix filtering?: an adaptive framework for similarity join and search

MCJoin: a memory-constrained join for column-store main-memory databases

Query preserving graph compression

Efficient spatial sampling of large geographical tables

Skeleton automata for FPGAs: reconfiguring without reconstructing

Prediction-based geometric monitoring over distributed data streams

Authenticating location-based services without compromising location privacy

Tiresias: the database oracle for how-to queries

Local structure and determinism in probabilistic databases

Optimal top-k generation of attribute combinations based on ranked lists

A highway-centric labeling approach for answering distance queries on large sparse graphs

Probase: a probabilistic taxonomy for text understanding

Towards effective partition management for large graphs

Efficient external-memory bisimulation on DAGs

Symbiosis in scale out networking and data management

Managing and mining large graphs: patterns and algorithms

Just-in-time information extraction using extraction views

MAQSA: a system for social analytics on news

Exploiting MapReduce-based similarity joins

Efficient transaction processing in SAP HANA database: the end of a column store myth

Anatomy of a gift recommendation engine powered by social media

F1: the fault-tolerant distributed RDBMS supporting google's ad business

Large-scale machine learning at twitter

Finding related tables

Adaptive optimizations of recursive queries in teradata

Towards scalable summarization and visualization of large text corpora (abstract only)

Test Of Time Award Talk: Executing SQL over Encrypted Data in the Database-Service-Provider Model

SkewTune: mitigating skew in mapreduce applications

Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

Holistic optimization by prefetching query results

SCARAB: scaling reachability computation on large graphs

Declarative error management for robust data-intensive applications

NoDB: efficient query execution on raw data files

Online windowed subsequence matching over probabilistic sequences

Effective caching of shortest paths for location-based services

GUPT: privacy preserving data analysis made easy

So who won?: dynamic max discovery with the crowd

Top-k bounded diversification

Efficient processing of distance queries in large graphs: a vertex cover approach

Optimizing index for taxonomy keyword search

TreeSpan: efficiently computing similarity all-matching

Materialized view selection for XQuery workloads

Managing and mining large graphs: systems and implementations

ColumbuScout: towards building local search engines over large databases

Surfacing time-critical insights from social media

GLADE: big data analytics made easy

Walnut: a unified cloud object store

Designing a scalable crowdsourcing platform

Oracle in-database hadoop: when mapreduce meets RDBMS

Recurring job optimization in scope

Optimizing analytic data flows for multiple execution engines

From x100 to vectorwise: opportunities, challenges and things most researchers do not think about

Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only)

SIGMOD Jim Gray Doctoral Dissertation Award Talk

Computational reproducibility: state-of-the-art, challenges, and database research opportunities

SOFIA SEARCH: a tool for automating related-work search

Taagle: efficient, personalized search in collaborative tagging networks

ReStore: reusing results of MapReduce jobs in pig

DP-tree: indexing multi-dimensional data under differential privacy (abstract only)

Database techniques for linked data management

RACE: real-time applications over cloud-edge

PrefDB: bringing preferences closer to the DBMS

Clydesdale: structured data processing on hadoop

Temporal provenance discovery in micro-blog message streams (abstract only)

Differential privacy in data publication and analysis

Partiqle: an elastic SQL engine over key-value stores

Auto-completion learning for XML

Tiresias: a demonstration of how-to queries

SigSpot: mining significant anomalous regions from time-evolving networks (abstract only)

JustMyFriends: full SQL, full transactional amenities, and access privacy

Logos: a system for translating queries into narratives

AstroShelf: understanding the universe through scalable navigation of a galaxy of annotations

VRRC: web based tool for visualization and recommendation on co-authorship network (abstract only)

Dynamic optimization of generalized SQL queries with horizontal aggregations

PAnG: finding patterns in annotation graphs

OPAvion: mining and visualization in large graphs

Fast sampling word correlations of high dimensional text data (abstract only)

ConsAD: a real-time consistency anomalies detector

VizDeck: self-organizing dashboards for visual analytics

CloudAlloc: a monitoring and reservation system for compute clusters

Interactive performance monitoring of a composite OLTP and OLAP workload

Kaizen: a semi-automatic index advisor

TIRAMOLA: elastic nosql provisioning through a cloud management platform

Towards scalable summarization and visualization of large text corpora (abstract only)

Content Provider	ACM Digital Library
Author	Sliwkanich, Tyler Home, Mitchell Barbosa, Denilson Yong, Aaron Schneider, Douglas
Abstract	Society is awash with problems requiring the analysis of vast quantities of text and data. From detecting flu trends out of twitter conversations to finding scholarly works answering specific questions, we rely more and more on computers to process text for us. Text analytics is the application of computational, mathematical, and statistical models to derive information from large quantities of data coming primarily as text. Our project provides fast and effective text-analytics tools for large document collections, such as the blogosphere. We use natural language processing and database techniques to extract, collect, analyze, visualize, and archive information extracted from text. We focus on discovering relationships between entities (people, places, organizations, etc.) mentioned in one or more sources (blog posts or news articles). We built a custom solution using mostly off-the-shelf, open-source tools to provide a scalable platform for users to search and analyze large text corpora. Currently, we provide two main outlets for users to discover these relations: (1) full-text search over the documents and (2) graph visualizations of the entities and their relationships. This provides the user with succinct and easily digestible information gleaned from the corpus as a whole. For example, we can easily pose queries like which companies were bought by Google? as entity:google relation:bought. The extracted data is stored on a combination of the noSQL database CouchDB and Apache's Lucene. This combination is justified as our work-flow consists of offline batch insertions with almost no updates. Because we support specialized queries, we can forgo the flexibility of traditional SQL solutions and materialize all necessary indices, which are used to quickly query large amounts of de-normalized data using MapReduce. Lucene provides a flexible and powerful query syntax to yield relevant ranked results to the user. Moreover, its indices are synchronized by a process subscribed to the list of database changes published by CouchDB. The graph visualizations rely on CouchDB's ability to export the data in any format: we currently use a customized graph visualization relying on XML data. Finally, we use memcached to further improve the performance, especially for queries involving popular entities.
Starting Page	863
Ending Page	863
Page Count	1
ISBN	9781450312479
DOI	10.1145/2213836.2213970
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2012-05-20
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Text analytics Information extraction
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in