Techniques of inverted index:part1

Published on 11/17,2016

I talked about inverted index in my previous post and I want to talk about some techniques of inverted index in this post.

Var-Byte Coding: Variable-byte compression represents an integer in a variable number of bytes, where each byte consists of one status bit, indicating whether another byte follows the current one, followed by 7 data bits. Thus, 142 = 1·27 +16 is represented as 10000001 0001000, while 2 is represented as 00000010. Var-byte compression does not achieve a very good compression ratio, but is simple and allows for fast decoding and is thus used in many systems.

Rice Coding: This method compresses a sequence of integers by first choosing a b such that 2b is close to the average value. Each integer n is then encoded in two parts: a quotient q = n/(2b) stored in unary code using q + 1 bits, and a remainder r = n mod 2b stored in binary using b bits. Rice coding achieves very good compression on standard unordered collections but is slower than var-byte, though the gap in speed can be reduced by using an optimized implementation.




Leave a Reply

Add comment


Comments are moderated to prevent spam. This may cause a delay before your post appears.