Loading...
Please wait, while we are loading the content...
Similar Documents
Processing of Huffman Compressed Texts with a Super-Alphabet (2003)
| Content Provider | CiteSeerX |
|---|---|
| Author | Fredriksson, Kimmo Tarhio, Jorma |
| Abstract | Abstract. We present an efficient algorithm for scanning Huffman compressed texts. The algorithm parses the compressed text in O(n log 2 σ b time, where n is the size of the compressed text in bytes, σ is the size of the alphabet, and b is a user specified parameter. The method uses b a variable size super-alphabet, with an average size of O ( ) sym-H log2 σ bols, where H is the entropy of the text. Each super-symbol is processed in O(1) time. The algorithm uses O(2 b) space and O(b2 b) preprocessing time. The method can be easily augmented by auxiliary functions, which can e.g. decompress the text or perform pattern matching in the compressed text. We give three example functions: decoding the text in average time O(n log2 σ), where w is the number of bits in a Hw machine word; an Aho-Corasick dictionary matching algorithm, which works in time O(n log2 σ + t), where t is the number of occurrences b reported; and a shift-or string matching algorithm that works in time O(n log2 σ ⌈(m + s − 1)/w ⌉ + t), where m is the length of the pattern and b s depends on the encoding. The Aho-Corasick algorithm uses an automaton with variable length moves, i.e. it processes variable number of states at each step. The shift-or algorithm makes variable length shifts, effectively also processing variable number of states at each step. The b H log2 σ number of states processed in O(1) time is O (). The method can be applied to several other algorithms as well. We conclude with some experimental results. 1 |
| File Format | |
| Publisher Date | 2003-01-01 |
| Publisher Institution | In Proceedings of SPIRE’2003, LNCS 2857 |
| Access Restriction | Open |
| Subject Keyword | Average Time Variable Length Shift Shift-or String Efficient Algorithm Variable Size Super-alphabet Example Function Preprocessing Time Algorithm Us Average Size Sym-h Log2 Bols Aho-corasick Algorithm Compressed Text Shift-or Algorithm Variable Number Experimental Result Aho-corasick Dictionary Matching Algorithm Variable Length Move Several Algorithm Hw Machine Word Auxiliary Function |
| Content Type | Text |
| Resource Type | Proceeding |