inverted index for a collection of documents is a structure that stores, for each term (word) occurring somewhere in the collection, information about the locations where it occurs. In particular, for each term t, the index contains an inverted list It consisting of a number of index postings. Each posting in It contains information about the occurrences of t in one particular document d, usually the ID of the document (the docID), the number of occurrences of t in d (the frequency), and possibly other information about the locations of the occurrences within the document and their contexts. The postings in each list are usually sorted by docID. 

search engine and its issues

Search engines come in a number of configurations that reflect the applications they are designed for.Web search engines, such as Google and Yahoo! must be able to capture, or crawl, many tera bytes of data, and then provide subsecond response times to millions of queries submitted everyday from around the world. The “big issues” in the design of search engines include the ones identified for information retrieval: effective ranking algorithms, evaluation, and user interaction. There are, however, a number of additional critical features of search engines that result from their deployment in large-scale, operational environments. Foremost among these features is the performance of the search engine in terms of measures such as response time, query throughput, and indexing speed.Response time is the delay between submitting a query and receiving the result list,throughput measures the number of queries that can be processed in a given time, and indexing speed is the rate at which text documents can be transformed into indexes for searching . An index is a data structure that improves the speed of search.The design of indexes for search engines is one of the major topics in this blog.



