在构建文件索引时收到错误。这是我的代码:
try(Directory dir = FSDirectory.open(indexPath.toFile());
Analyzer analyzer = new StandardAnalyzer()) {
IndexWriterConfig iwc = new IndexWriterConfig(Version.LATEST, analyzer);
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
try (IndexWriter indexWriter = new IndexWriter(dir, iwc)) {
................
Field pathField = new StringField(PATH_ATTRIBUTE,
file.toString(), Field.Store.YES)
Document document = new Document();
document.add(pathField);
document.add(new StringField(TYPE_ATTRIBUTE, clazz.getSimpleName(), Field.Store.YES));
document.add(new TextField(DATA_ATTRIBUTE,
new BufferedReader(
new InputStreamReader(
stream, StandardCharsets.UTF_8))));
if (indexWriter.getConfig().getOpenMode() == IndexWriterConfig.OpenMode.CREATE_OR_APPEND) {
indexWriter.addDocument(document);
}
}
异常 stacktrace:
02:36:36.741 [main] ERROR c.r.e.d.service.utils.LuceneUtils -
startOffset必须为非负数,endOffset必须为> =
startOffset,startOffset = 2147483646,endOffset = -2147483645
java.lang.IllegalArgumentException:startOffset必须是非负的,
和endOffset必须是> = startOffset,
startOffset = 2147483646,endOffset = -2147483645 at org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl.setOffset(PackedTokenAttributeImpl.java:107)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
在
org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:208)
〜[lucene-analyzers-common-4.10.4.jar:4.10.4 1662817 - 迈克 -
2015-02-27 16:38:59]在 org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:57)
〜[lucene-analyzers-common-4.10.4.jar:4.10.4 1662817 - 迈克 -
2015-02-27 16:38:59]在 org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:62)
〜[lucene-analyzers-common-4.10.4.jar:4.10.4 1662817 - 迈克 -
2015-02-27 16:38:59]在 org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:90)
〜[lucene-analyzers-common-4.10.4.jar:4.10.4 1662817 - 迈克 -
2015-02-27 16:38:59]在 org.apache.lucene.index.DefaultIndexingChain $ PerField.invert(DefaultIndexingChain.java:618)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
在
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
在
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
在
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
在
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
在
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
在
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
在
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1234)
〜[lucene-core-4.10.4.jar:4.10.4 1662817 - 迈克 - 2015-02-27 16:38:43]
我也发现了这个jira https://issues.apache.org/jira/browse/LUCENE-5111。看起来应该在 4.8 版本中修复,但它仍然出现在 4.10.4 上。
更新
我也在5.3.0版本上检查了这个并看到了相同的结果:
15:22:50.649 [main] ERROR developertest.LuceneUtils - startOffset必须
是非负的,endOffset必须是> = startOffset,
开始偏移= 2147483645,endOffset = -2147483646
java.lang.IllegalArgumentException:startOffset必须是非负的,
和endOffset必须是> = startOffset,
在左边的startOffset = 2147483645,endOffset = -2147483646 org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl.setOffset(PackedTokenAttributeImpl.java:107)
〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
在
org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:170)
〜[lucene-analyzers-common-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17
16:59:45] org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:36)
〜[lucene-analyzers-common-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17
16:59:45] org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45)
〜[lucene-analyzers-common-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17
16:59:45] org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
〜[lucene-analyzers-common-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17
16:59:45] org.apache.lucene.index.DefaultIndexingChain $ PerField.invert(DefaultIndexingChain.java:613)
〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
在
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
在
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
在
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
在
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
在
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1475)
〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]
在
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1254)
〜[lucene-core-5.3.0.jar:5.3.0 1696229 - 贵族 - 2015-08-17 16:59:03]