Question

我有一个大约5GB的Lucene Index .FDT文件。我经常添加记录（每天1000条记录），不会删除任何记录。它有5个字段，其中只有一个是html页面的文本内容。我还在这个索引上运行一个查询解析器来查找一些关键字。尽管每次插入时索引都已优化，但在html页面的文本内容中找到关键字几乎需要一分钟。有没有人经历过这个问题以及如何解决这个问题的任何建议？

以下是我在代码中执行的步骤 1。使用SQLData Reader，获取包含标题，EmployeeID，标题（员工部门的简短描述），日期（此员工被添加到表中的日期或其信息已更改）的表格内容，数据（员工详细信息的html版本） 2.对于表中的每条记录，请执行以下操作

string body= strip text from html from webpage or data;
 var doc = new Document();
 doc.Add(new Field("title", staticname, Field.Store.YES, Field.Index.ANALYZED)); //title is always "Employee info"
 doc.Add(new Field("Employeeid", keyid.Replace(",", " "), Field.Store.YES, Field.Index.ANALYZED));
 doc.Add(new Field("headline", head, Field.Store.YES, Field.Index.ANALYZED)); 
 doc.Add(new Field("date", DateTools.DateToString(date, DateTools.Resolution.SECOND), Field.Store.YES, Field.Index.NOT_ANALYZED));
             if (data == null)
                  data = "";
             else if (data.Length > 500)
             {
                   data = data.Substring(0, 500);
             }
             doc.Add(new Field("body", data, Field.Store.YES, Field.Index.ANALYZED));
             indexWriter.AddDocument(doc);
             indexWriter.Optimize();
             indexWriter.Commit();
             indexWriter.Dispose();

----在搜索程序中

string searchword="disability";
QueryParser queryParser = new QueryParser(VERSION, "body", analyzer);
string word = "+Employeeid:" + Employeeid+ " +body:" + searchword;
Query query = queryParser.Parse(word);

try
 {
           IndexReader reader = IndexReader.Open(luceneIndexDirectory, true);
          Searcher indexSearch = new IndexSearcher(reader);
           TopDocs hits = indexSearch.Search(query, 1);

            if (hits.TotalHits > 0)
            {
             float score = hits.ScoreDocs[0].Score;
             if (score > MINSCORE)
            {
              results.Add(result);  //it is a list that has EmployeeID,searchwordID,searchword,score
             }
           }

           indexSearch.Dispose();
           reader.Dispose();
           indexWriter.Dispose();
         }

赞赏任何意见。

由于中号

Answer 1

请勿将正文和标题字段存储到索引中。

 doc.Add(new Field("headline", head, Field.Store.No, Field.Index.ANALYZED)); 
 doc.Add(new Field("body", head, Field.Store.No, Field.Index.ANALYZED));

搜索无用。

Lucene指数增长过大

1 个答案: