Build index base

Tools used: DataScraper, a data and clue extraction tool.

DataScraper has been integrated with a facility named as Index Manager which manages the Lucene indexing engine in DataStore server. Via Index Manager, data schema and property specific indexing parameters can be specified, such as property specific boost parameter, key attribute, storing switch etc. Index Manager is enough for building up most of vertical search engines.



Exercises

The following indexing parameters are set via Index Manager for the ebook search engine(how to operate Index Manager is stated in DataScraper User's Guide#Index Manager):

Property Store Param Index Param Boost Param
content brief YES TOKENIZED 1.0
content YES TOKENIZED 1.0
title YES TOKENIZED 1.2
book page YES UNTOKENIZED 1.0

By now, an ebook search engine has been built up. Load the page http://localhost:8080/datastore/searchharvest.htm and input keyword asp, may books about ASP are presented.