Question

如何从lucene索引中获取用于TokenStream Field内的令牌（无论是令牌列表，Document还是其他内容）？也就是说，是否可以从索引中获取tokens（来自示例）中使用的令牌？（我不知道如何从TokenStream）中获取令牌

doc.add(new Field("title", tokens))

在Field.tokenStreamValue()的文档中，但当我执行doc.getFieldable(field_name)时，只返回null。

我也试过（来自lucene - Fieldable.tokenStreamValue()中的第三条评论）：

TokenSources.getTokenStream(reader, doc_id, field_name)

但我得到

java.lang.IllegalArgumentException: title in doc #630does not have any term position data stored
    at org.apache.lucene.search.highlight.TokenSources.getTokenStream(TokenSources.java:256)

Answer 1

TokenSources类是一个帮助程序类，用于检索文档的标记以突出显示。有两种方法可以检索给定文档的术语：

重新分析存储的字段
阅读文档的术语矢量。

您要使用tries to read the document's terms vector的方法，但由于您未在索引时启用术语向量而失败。

因此，您可以在索引时启用术语向量并继续使用此方法（请参阅Field constructor和Field.TermVector的文档）或重新分析存储字段的内容。第一种方法可以提供更好的性能，特别是对于大型字段，而第二种方法可以节省空间（如果您的字段已经存储，则没有其他信息可以存储）。

从lucene索引获取字段的标记

1 个答案: