我使用lucene.net为文档建立索引。我的主要目的是要搜索并在文档中返回行号和文本行。
这是索引的代码
<tbody>
{% for t in tdata %}
<tr>
<td>{{ t.id }}</td>
<td>{{ t.objectid }}</td>
<td>{{ if t.status == '0':
<img src="/static/images/red.png"> + t.status
elif t.status == '1':
<img src="/static/images/green.png"> + t.status
else:
<img src="/static/images/yellow.png"> + t.status }}</td>
</tr>
{% endfor %}
</tbody>
如您所见,我添加了文件名,修改日期,然后循环浏览文件中的所有行,并为每行添加一个using (TextReader contentsReader = new StreamReader(fi.FullName))
{
doc.Add(new StringField("FullFileName", fi.FullName, Field.Store.YES));
doc.Add(new StringField("LastModifiedDate", modDate, Field.Store.YES));
//doc.Add(new TextField("Contents", contentsReader.ReadToEnd(), Field.Store.YES));
int lineCount = 1;
string line = String.Empty;
while ((line = contentsReader.ReadLine()) != null)
{
doc.Add(new Int32Field("LineNo", lineCount, Field.Store.YES));
doc.Add(new TextField("Contents", line, Field.Store.YES));
lineCount++;
}
Console.ForegroundColor = ConsoleColor.Blue;
Console.WriteLine("adding " + fi.Name);
Console.ResetColor();
writer.AddDocument(doc);
}
。
这是我的搜索方式:
TextField
但是。我的搜索结果返回0次匹配,而如果我简单地注释掉 Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48);
QueryParser parser = new QueryParser(Lucene.Net.Util.LuceneVersion.LUCENE_48, "Contents", analyzer);
Lucene.Net.Search.Query query = parser.Parse(searchString);
Lucene.Net.Store.Directory directory = Lucene.Net.Store.FSDirectory.Open(new System.IO.DirectoryInfo(indexDir));
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Index.DirectoryReader.Open(directory));
TopScoreDocCollector collector = TopScoreDocCollector.Create(100, true);
searcher.Search(query, collector);
ScoreDoc[] hits1 = collector.GetTopDocs().ScoreDocs;
for (int i = 0; i < hits1.Length; i++)
{
int docId = hits1[i].Doc;
float score = hits1[i].Score;
Lucene.Net.Documents.Document doc = searcher.Doc(docId);
string result = "FileName: " + doc.Get("FullFileName") + "\n"+
" Line No: " + doc.Get("LineNo") + "\n"+
" Contents: " + doc.Get("Contents");
}
循环并取消注释上面的注释行,则会得到结果。
可能是什么问题?
答案 0 :(得分:0)
这可能是因为Lucene 4.0+中分析仪的重用策略发生了变化。重用策略是将令牌缓存在字典中,因此对于每次迭代,索引仅存储一些令牌,而一次传递所有令牌时会处理所有令牌。可能需要重写重用策略,我直接对其进行覆盖,以使其表现出与Lucene 3.0.5中相同的方式。让我知道这是否有帮助