在lucene索引文档中查找和排列多个短语匹配

时间:2012-01-17 13:38:32

标签: solr lucene phrases

鉴于包含文本的一系列文档,我想搜索短语并返回所有匹配并对其进行排名。我知道如何让lucene / solr指出哪些文档匹配,并在文档中突出显示,但如何获得包含来自同一文档的多个匹配的排名?

First document.  It has a single line of text.
Second document.  This text line is quite short.
This is another line containing more text and is a bit longer.

如果我搜索“文本行”,那么我希望找到三个匹配项,排名如下:

2nd document -> ...This "text line" is quite short.
1st document -> ...It has a single "line of text".
2nd document -> ...another "line containing more text" and is...

这可能吗?怎么样?

1 个答案:

答案 0 :(得分:-1)

如果您希望每行匹配一个匹配项,那么请将每一行设为自己的文档。不要将术语“文档”与文本实际上是单个文件混淆。

如果要维护链接回文件,只需将id索引到另一个(存储的)字段中。

{ id: "myfile.txt",
  text: "first line" }

{ id: "myfile.txt",
  text: "second line" }