如何在分析后获取Lucene文档字段标记的条款?

时间:2015-08-03 13:41:32

标签: java lucene token

我使用的是Lucene 5.1.0。在分析和索引文档之后,我想获得属于该特定文档的所有索引条款的列表。

{        
        File[] files = FILES_TO_INDEX_DIRECTORY.listFiles();
        for (File file : files) {
            Document document = new Document();
            Reader reader = new FileReader(file);
            document.add(new TextField("fieldname",reader));            
            iwriter.addDocument(document);
        }  

        iwriter.close();
        IndexReader indexReader = DirectoryReader.open(directory);
        int maxDoc=indexReader.maxDoc();
        for (int i=0; i < maxDoc; i++) {
            Document doc=indexReader.document(i);
            String[] terms = doc.getValues("fieldname");
        }
}

条款返回null。有没有办法获得每个文档的保存条款?

1 个答案:

答案 0 :(得分:1)

以下是使用TokenStream

的答案示例代码
 TokenStream ts= analyzer.tokenStream("myfield", reader);
            // The Analyzer class will construct the Tokenizer, TokenFilter(s), and CharFilter(s),
            //   and pass the resulting Reader to the Tokenizer.
            OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
            CharTermAttribute charTermAttribute = ts.addAttribute(CharTermAttribute.class);

            try {
                ts.reset(); // Resets this stream to the beginning. (Required)
                while (ts.incrementToken()) {
                    // Use AttributeSource.reflectAsString(boolean)
                    // for token stream debugging.
                    System.out.println("token: " + ts.reflectAsString(true));
                    String term = charTermAttribute.toString();
                    System.out.println(term);

                }
                ts.end();   // Perform end-of-stream operations, e.g. set the final offset.
            } finally {
                ts.close(); // Release resources associated with this stream.
            }