如何使用Lucene搜索引擎API搜索多个站点?

时间:2011-11-26 01:46:22

标签: lucene.net

希望有人能尽快帮助我:-) 我想知道如何使用Lucene搜索多个站点? (所有网站都在一个索引中)。

我已成功搜索一个网站,并为多个网站编制索引,但我无法搜索所有网站。

考虑一下我的方法:

  private void PerformSearch()
    {
        DateTime start = DateTime.Now;

        //Create the Searcher object
        string strIndexDir = Server.MapPath("index") + @"\" + mstrURL;
        IndexSearcher objSearcher = new IndexSearcher(strIndexDir); 

        //Parse the query, "text" is the default field to search
        Query objQuery = QueryParser.Parse(mstrQuery, "text", new StandardAnalyzer()); 

        //Create the result DataTable
        mobjDTResults.Columns.Add("title", typeof(string));
        mobjDTResults.Columns.Add("path", typeof(string));
        mobjDTResults.Columns.Add("score", typeof(string));
        mobjDTResults.Columns.Add("sample", typeof(string));
        mobjDTResults.Columns.Add("explain", typeof(string));

        //Perform search and get hit count
        Hits objHits = objSearcher.Search(objQuery);
        mintTotal = objHits.Length();

        //Create Highlighter
        QueryHighlightExtractor highlighter = new QueryHighlightExtractor(objQuery, new StandardAnalyzer(), "<B>", "</B>");

        //Initialize "Start At" variable
        mintStartAt = GetStartAt();

        //How many items we should show?
        int intResultsCt = GetSmallerOf(mintTotal, mintMaxResults + mintStartAt);

        //Loop through results and display
        for (int intCt = mintStartAt; intCt < intResultsCt; intCt++) 
        {
            //Get the document from resuls index
            Document doc = objHits.Doc(intCt);

            //Get the document's ID and set the cache location
            string strID = doc.Get("id");
            string strLocation = "";
            if (mstrURL.Substring(0,3) == "www")
                strLocation = Server.MapPath("cache") + 
                    @"\" + mstrURL + @"\" + strID + ".htm";
            else
                strLocation = doc.Get("path") + doc.Get("filename");

            //Load the HTML page from cache
            string strPlainText;
            using (StreamReader sr = new StreamReader(strLocation, System.Text.Encoding.Default))
            {
                strPlainText = ParseHTML(sr.ReadToEnd());
            }

            //Add result to results datagrid
            DataRow row = mobjDTResults.NewRow();
            if (mstrURL.Substring(0,3) == "www")
                row["title"] = doc.Get("title");
            else
                row["title"] = doc.Get("filename");
            row["path"] = doc.Get("path");
            row["score"] = String.Format("{0:f}", (objHits.Score(intCt) * 100)) + "%";
            row["sample"] = highlighter.GetBestFragments(strPlainText, 200, 2, "...");
            Explanation objExplain = objSearcher.Explain(objQuery, intCt);
            row["explain"] = objExplain.ToHtml();
            mobjDTResults.Rows.Add(row);
        } 
        objSearcher.Close();

        //Finalize results information
        mTsDuration = DateTime.Now - start;
        mintFromItem = mintStartAt + 1;
        mintToItem = GetSmallerOf(mintStartAt + mintMaxResults, mintTotal);
    }

正如您所知,我在创建搜索对象时使用网站网址mstrURL

string strIndexDir = Server.MapPath("index") + @"\" + mstrURL;

当我想搜索多个网站时,我该怎么做呢?

实际上我使用的是this代码。

1 个答案:

答案 0 :(得分:0)

将您网站中的每个Searcher合并到MultiSearcher

有关详细信息,请参阅此问题: Multiple Indexes search in Lucene.Net