构建索引

时间:2015-09-13 00:01:15

标签: java lucene

在构建文件索引时收到错误。这是我的代码:

 try(Directory dir = FSDirectory.open(indexPath.toFile());
    Analyzer analyzer = new StandardAnalyzer()) {

    IndexWriterConfig iwc = new IndexWriterConfig(Version.LATEST, analyzer);
    iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
    try (IndexWriter indexWriter = new IndexWriter(dir, iwc)) {
                     ................

         Field pathField = new StringField(PATH_ATTRIBUTE,
                                           file.toString(), Field.Store.YES)
         Document document = new Document();
         document.add(pathField);
         document.add(new StringField(TYPE_ATTRIBUTE, clazz.getSimpleName(), Field.Store.YES));
         document.add(new TextField(DATA_ATTRIBUTE,
                      new BufferedReader(
                      new InputStreamReader(
                      stream, StandardCharsets.UTF_8))));

         if (indexWriter.getConfig().getOpenMode() == IndexWriterConfig.OpenMode.CREATE_OR_APPEND) {
                        indexWriter.addDocument(document);
                    }
    }

异常 stacktrace:

  

02:36:36.741 [main] ERROR c.r.e.d.service.utils.LuceneUtils -
  startOffset必须为非负数,endOffset必须为> =
  startOffset,startOffset = 2147483646,endOffset = -2147483645
  java.lang.IllegalArgumentException:startOffset必须是非负的,
  和endOffset必须是> = startOffset,
  startOffset = 2147483646,endOffset = -2147483645 at   org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl.setOffset(PackedTokenAttributeImpl.java:107)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
    在
  org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:208)
  〜[lucene-analyzers-common-4.10.4.jar:4.10.4 1662817 - 迈克 -
  2015-02-27 16:38:59]在   org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:57)
  〜[lucene-analyzers-common-4.10.4.jar:4.10.4 1662817 - 迈克 -
  2015-02-27 16:38:59]在   org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:62)
  〜[lucene-analyzers-common-4.10.4.jar:4.10.4 1662817 - 迈克 -
  2015-02-27 16:38:59]在   org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:90)
  〜[lucene-analyzers-common-4.10.4.jar:4.10.4 1662817 - 迈克 -
  2015-02-27 16:38:59]在   org.apache.lucene.index.DefaultIndexingChain $ PerField.invert(DefaultIndexingChain.java:618)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
    在
  org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
    在
  org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
    在
  org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
    在
  org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
    在
  org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
    在
  org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
    在
  org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1234)
  〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]

我也发现了这个jira https://issues.apache.org/jira/browse/LUCENE-5111。看起来应该在 4.8 版本中修复,但它仍然出现在 4.10.4 上。

更新

我也在5.3.0版本上检查了这个并看到了相同的结果:

  

15:22:50.649 [main] ERROR developertest.LuceneUtils - startOffset必须
  是非负的,endOffset必须是> = startOffset,
  开始偏移= 2147483645,endOffset = -2147483646
  java.lang.IllegalArgumentException:startOffset必须是非负的,
  和endOffset必须是> = startOffset,
  在左边的startOffset = 2147483645,endOffset = -2147483646   org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl.setOffset(PackedTokenAttributeImpl.java:107)
  〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
    在
  org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:170)
  〜[lucene-analyzers-common-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17
  16:59:45]   org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:36)
  〜[lucene-analyzers-common-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17
  16:59:45]   org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45)
  〜[lucene-analyzers-common-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17
  16:59:45]   org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
  〜[lucene-analyzers-common-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17
  16:59:45]   org.apache.lucene.index.DefaultIndexingChain $ PerField.invert(DefaultIndexingChain.java:613)
  〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
    在
  org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
  〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
    在
  org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
  〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
    在
  org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
  〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
    在
  org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
  〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
    在
  org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1475)
  〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
    在
  org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1254)
  〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]

0 个答案:

没有答案