Context indexing

Published on 12/01,2016

I have a seminar about indexing in my search engine course one of aspect in indexing is content-based indexing so I want talk about it in this post.

In architecture context based indexing, web pages are stored in the crawled web page repository. The indexes hold the valuable compressed information for each web page. The preprocessing steps are performed on the documents (i.e. stemming as well as removal of stop words). The keywords are extracted from the document, and their corresponding multiple contexts are identified from the Word Net. Indexer maintains the index of the keyword using the Binary Search tree.

Documents are arranged by the keywords it contains and the index is maintained in lexical order. For every alphabet in the index there is one BST (Binary search tree) containing the keywords with the first letter matches with the alphabet. Each node in the BST points to a structure that contains the list of contextual meanings corresponding to that keyword and contains the pointers to the documents that matches the particular meaning.

Keyword – is the keyword that appear in some or more documents in local database and that will match the user query keyword.

 List of Contexts – is the list of all different usage/senses of the keyword obtained from the WordNet.

C1 – stands for the contextual sense 1 With each Contextual sense (C) a list of pointers to the documents in which this C appears is associated. Where,

 D1 – stands for the pointer to document 1

Steps to search the index to resolve a query

1. For the query keyword given by the user, search the index to get match with the first alphabet of the keyword

2. The corresponding BST is selected for further searching.

3. If a match is found with some entry, corresponding list of meaning i.e. C1, C2, C3…etc is displayed to the user to get the user selection, after getting a specific choice from the user the corresponding list of pointers is accessed to get the documents from the repository and finally displayed to the user as final result for the query.

4. Else if the keyword is not found in the corresponding BST, the appropriate insertion is done in the BST and no match found is displayed to the user.

 

 


Comments

Leave a Reply

Add comment
Info

unmoderate_note

Comments are moderated to prevent spam. This may cause a delay before your post appears.

 authimage