NDLI: Detecting and exploiting near-sortedness for efficient relational query evaluation

Please wait, while we are loading the content...

Tractability in probabilistic databases

The PADS project: an overview

Efficient reasoning about data trees via integer linear programming

Querying probabilistic business processes for sub-flows

Relaxed notions of schema mapping equivalence revisited

On the equivalence of distributed systems with queries and communication

Knowledge compilation meets database theory: compiling queries to decision diagrams

Relative expressive power of navigational querying on graphs

Faster query answering in probabilistic databases using read-once functions

On provenance and privacy

Generating, sampling and counting subclasses of regular tree languages

Artifact systems with data dependencies and arithmetic

Solutions in XML data exchange

Two-variable logic and key constraints on data words

On the optimal approximation of queries using tractable propositional languages

Complexity of higher-order queries

The complexity of evaluating tuple generating dependencies

View update translation for XML

Comparing workflow specification languages: a matter of views

Simplifying schema mappings

Satisfiability algorithms for conjunctive queries over trees

(Approximate) uncertain skylines

Conjunctive queries determinacy and rewriting

Detecting and exploiting near-sortedness for efficient relational query evaluation

Data cleaning and query answering with matching dependencies and matching functions

Detecting and exploiting near-sortedness for efficient relational query evaluation

Content Provider	ACM Digital Library
Author	Matsliah, Arie Ben-Moshe, Sagi Kanza, Yaron Fischer, Mani Staelin, Carl Fischer, Eldar
Abstract	Many relational operations are best performed when the relations are stored sorted over the relevant attributes (e.g. the common attributes in a natural join operation). However, generally relations are not stored sorted because it is expensive to maintain them this way (and impossible whenever there is more than one relevant sort key). Still, many times relations turn out to be nearly-sorted, where most tuples are close to their place in the order. This state can result from "leftover sortedness", where originally sorted relations were updated, or were combined into interim results when evaluating a complex query. It can also result from weak correlations between attribute values. Currently, nearly-sorted relations are treated the same as unsorted relations, and when relational operations are evaluated for them, a generic algorithm is used. Yet, many operations can be computed more efficiently by an algorithm that exploits this near-ordering. However, to consistently benefit from using such algorithms the system should also refrain from using the wrong algorithm for relations which happen not to be sorted at all. Thus, an efficient test is required, i.e., a very fast approximation algorithm for establishing whether a given relation is sufficiently nearly-sorted. In this paper, we provide the theoretical foundations for improving query evaluation over possibly nearly-sorted relations. First we formally define what it means for a relation to be nearly-sorted, and show how operations over such relations, such as natural join, set operations and sorting, can be executed significantly more efficiently using an algorithm that we provide. If a relation is nearly-sorted enough, then it can be sorted using two sequential reads of the relation, and writing no intermediate data to disk. We then construct efficient probabilistic tests for approximating the degree of the near-sortedness of a relation without having to read an entire file. The role of our algorithms in a database management system setting is illustrated as soon as the theoretical foundation is laid out. Finally, we outline factors that relate to practical implementations of our algorithms. We show how our test can be enhanced to provide an approximation rather than just a yes-no answer, and discuss its implementability in reallife scenarios where some sparseness may be present in the database files (e.g. if they were created using a B*-tree approach). We also show how our sort can benefit distributed systems and systems that use a solid-state drive, which may very well become prevalent in the near future.
Starting Page	256
Ending Page	267
Page Count	12
File Format	PDF
ISBN	9781450305297
DOI	10.1145/1938551.1938584
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2011-03-21
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Solid state drives Query processing Property testing Relational databases Relational operators Sorting
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in