NDLI: GPText: Greenplum parallel statistical text analysis framework

Please wait, while we are loading the content...

Don't match twice: redundancy-free similarity computation with MapReduce

Multi-objective optimization of data flows in a multi-cloud environment

ScyPer: elastic OLAP throughput on transactional data

Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

A vision for personalized service level agreements in the cloud

Towards a workload for evolutionary analytics

GPText: Greenplum parallel statistical text analysis framework

Enabling secure query processing in the cloud using fully homomorphic encryption

A case for dynamic memory partitioning in data centers

GPText: Greenplum parallel statistical text analysis framework

Content Provider	ACM Digital Library
Author	Li, Kun Khatri, Sunny Grant, Christan Wang, Daisy Zhe Chitouras, George
Abstract	Many companies keep large amounts of text data inside of relational databases. Several challenges exist in using state-of-the-art systems to perform analysis on such datasets. First, expensive big data transfer cost must be paid up front to move data between databases and analytics systems. Second, many popular text analytics packages do not scale up to production sized datasets. In this paper, we introduce GPText, Greenplum parallel statistical text analysis framework that addresses the above problems by supporting statistical inference and learning algorithms natively in a massively parallel processing database system. GPText seamlessly integrates the Solr search engine and applies statistical algorithms such as k-means and LDA using MADLib, an open source library for scalable in-database analytics which can be installed on Post-greSQL and Greenplum. In addition, GPText also developed and contributed a linear-chain conditional random field(CRF) module to MADLib to enable information extraction tasks such as part-of-speech tagging and named entity recognition. We show the performance and scalability of the parallel CRF implementation. Finally, we describe an eDiscovery application built on the GPText framework.
Starting Page	31
Ending Page	35
Page Count	5
File Format	PDF
ISBN	9781450322027
DOI	10.1145/2486767.2486774
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2013-06-23
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Massive parallel processing Text analytics Rdbms
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in