Question

Here are some code to access terms in a Lucene document:
int docId = hits[i].doc;  
TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents");  
TermPositionVector tpvector = (TermPositionVector)tfvector;  
// this part works only if there is one term in the query string,  
// otherwise you will have to iterate this section over the query terms.  
int termidx = tfvector.indexOf(querystr);  
int[] termposx = tpvector.getTermPositions(termidx);  
TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx);

我的问题是，使用termposx，我如何根据termposx数组获得这些术语？

Answer 1

Zincup：termposx有{7,19,34}。什么是8或9这个词？如何访问它？

TermPositionVector.getTermPositions（）返回找到该词语的位置数组。

术语由索引标识，其中的数字出现在从 indexOf 方法获得的术语字符串数组中。

所以它与出现在{7,19,34}的多个位置的术语相同。

使用TermPositionVector，您可以访问“找到每个术语的位置”，但不能访问其他方式。

我很害怕，你要迭代在8,9号找到这个词。我将进一步探索API，如果找到解决方案，请告知您。

如何根据Lucene的termPostion获得这个词？

1 个答案: