indexing in bigdata

Published on 12/15,2016

In this post I want to talk about indexing in bigdata. before that you shod know what approaches big data is using.

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

Example for base indexing:

Consider the problem of counting the number of occurrences of each word in a large collection of documents. The user would write code similar to the following pseudo-code:

Inverted Index: The map function parses each document, and emits a sequence of < word; document ID>pairs. The reduce function accepts all pairs for a given word, sorts the corresponding document IDs and emits a <word; list(document ID)> pair. The set of all output pairs forms a simple inverted index. 




Leave a Reply

Add comment


Comments are moderated to prevent spam. This may cause a delay before your post appears.