solr config

Published on 11/28,2016

In this post I want to talk about how to index wiki data in solr

1.run cmd

2.go to bin folder of solr with cd command

3. write solr.cmd start to run solr

4.write solr.cmd create -c wiki for create core in solr

5. go to  <<solr-6.2.1\server\solr\wiki\conf>> for config wiki core to index wiki

6. in this folder you have managed-schema file, please change file to schema.xml. write the code below

<field name="_version_" type="long" indexed="true" stored="true"/>

<field name="id" type="string" indexed="true" stored="true" required="true"/>

 <field name="title" type="string" indexed="true" stored="true"/>

 <field name="revision" type="int" indexed="true" stored="false"/>

<field name="user" type="string" indexed="true" stored="false"/>

<field name="userId" type="int" indexed="true" stored="false"/>

<field name="_text_" type="text_en" indexed="true" stored="false"/>

Instead of

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

 <field name="_version_" type="long" indexed="true" stored="false"/>

 <field name="_root_" type="string" indexed="true" stored="false" docValues="false" />

 <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>

7.go to <<solr-6.2.1\server\solr\wiki\conf\solrconfig.xml>>

After  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar" />

 write  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />

After  <requestHandler name="/browse" class="solr.SearchHandler" useParams="query,facets,velocity,browse">

    <lst name="defaults">

      <str name="echoParams">explicit</str>

    </lst>

write </requestHandler>

 <requestHandler name="/dihupdate" class="org.apache.solr.handler.dataimport

.DataImportHandler "startu="lazy">

    <lst name="defaults">

      <str name="config">data-config.xml</str>

    </lst>

  </requestHandler>

8.create data-config.xml file in <<solr-6.2.1\server\solr\wiki\conf>> and write code below

<dataConfig>

       <dataSource type="FileDataSource" encoding="UTF-8" />

        <document>

        <entity name="page"

                processor="XPathEntityProcessor"

                stream="true"

                forEach="/mediawiki/page/"

                url="F:\solr-6.2.1\server\solr\wiki6\enwiki-20160113-pages-articles1.xml"

                transformer="RegexTransformer,DateFormatTransformer"

                >

            <field column="id"        xpath="/mediawiki/page/id" />

            <field column="title"     xpath="/mediawiki/page/title" />

            <field column="revision"  xpath="/mediawiki/page/revision/id" />

            <field column="user"      xpath="/mediawiki/page/revision/contributor/username" />

            <field column="userId"    xpath="/mediawiki/page/revision/contributor/id" />

            <field column="text"      xpath="/mediawiki/page/revision/text" />

            <field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />

            <field column="$skipDoc"  regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/>

       </entity>

        </document>

</dataConfig>

9.restart solr with solr stop –all and solr start command

10.run your browser and write localhost:8983/solr/wiki/dihupdate to index your data

11. go to wiki core in browser and see numDocs which is indexed


Comments

Leave a Reply

Add comment
Info

unmoderate_note

Comments are moderated to prevent spam. This may cause a delay before your post appears.

 authimage