这似乎是一个常见的问题,除了我之前没有遇到任何麻烦,通常的修复不起作用。这可能是愚蠢的,但我找不到它。
我想索引一个yammer网站,因为yammer api对我的目的不够快,问题是当我尝试使用updateDocument功能更新我的索引时,旧的不会被删除。但是我有一个未分析的存储唯一密钥。
以下是相关代码:
Document newdoc = new Document();
newdoc.add(new Field(YammerMessageFields.URL, resultUrl, Field.Store.YES, Field.Index.NOT_ANALYZED));
newdoc.add(new Field(YammerMessageFields.THREAD_ID, threadID.toString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
newdoc.add(new Field(YammerMessageFields.AUTHOR, senderName, Field.Store.YES, Field.Index.ANALYZED));
newdoc.add(new Field(YammerMessageFields.CONTENTS, resultText, Field.Store.YES, Field.Index.ANALYZED));
Term key = new Term(YammerMessageFields.THREAD_ID, newdoc.getFieldable(YammerMessageFields.THREAD_ID).toString());
logger.debug("updating document with key: " + key);
try {
IndexWriter writer = getIndexWriter();
writer.updateDocument(key, newdoc);
writer.close();
} catch (IOException e) {
}
我在日志中看到的是:
2012-05-11 12:02:29,816 DEBUG [http-8088-2] LuceneIndex - https://www.yammer.com/api/v1/messages/?newer_than=0
2012-05-11 12:02:38,594 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173285202>
2012-05-11 12:02:45,167 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173033239>
2012-05-11 12:02:51,686 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173014568>
2012-05-11 12:02:51,871 DEBUG [http-8088-2] LuceneIndex - new items:3
2012-05-11 12:03:27,393 DEBUG [http-8088-2] YammerResource - return all documents
2012-05-11 12:03:27,405 DEBUG [http-8088-2] YammerResource - nr docs:3
2012-05-11 12:03:27,405 DEBUG [http-8088-2] YammerResource - nr dels:0
...
next update
...
2012-05-11 12:03:35,802 DEBUG [http-8088-2] LuceneIndex - https://www.yammer.com/api/v1/messages/?newer_than=0
2012-05-11 12:03:43,933 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173322760>
2012-05-11 12:03:50,467 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173285202>
2012-05-11 12:03:56,982 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173056406>
2012-05-11 12:04:03,533 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173033239>
2012-05-11 12:04:10,097 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173030769>
2012-05-11 12:04:16,629 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173014568>
2012-05-11 12:04:23,169 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173003570>
2012-05-11 12:04:23,341 DEBUG [http-8088-2] LuceneIndex - new items:7
2012-05-11 12:05:09,694 DEBUG [http-8088-1] YammerResource - return all documents
2012-05-11 12:05:09,696 DEBUG [http-8088-1] YammerResource - nr docs:10
2012-05-11 12:05:09,696 DEBUG [http-8088-1] YammerResource - nr dels:0
因此密钥重新出现(以及4个新密钥),但是当这样做时,我的商店中有10个文件而不是7个(还有3个已删除)。
编辑:这是我如何找到这些项目,但我实际上是显示它们并用Luke检查它。
IndexReader r = IndexReader.open(searchIndex.getIndex());
List<Document> docList = new ArrayList<Document>();
List<Document> delList = new ArrayList<Document>();
int num = r.numDocs();
num += r.numDeletedDocs();
for ( int i = 0; i < num && i < max; i++)
{
if ( ! r.isDeleted( i))
docList.add(r.document(i));
else
delList.add(r.document(i));
}
r.close();
logger.debug("nr docs:" + docList.size());
logger.debug("nr dels:" + delList.size());
答案 0 :(得分:1)
我不确定没有运行一些测试代码,但这对我来说是错误的:
Term key = new Term(YammerMessageFields.THREAD_ID,
newdoc.getFieldable(YammerMessageFields.THREAD_ID).toString());
你确定它不应该是:
Term key = new Term(YammerMessageFields.THREAD_ID,
newdoc.getFieldable(YammerMessageFields.THREAD_ID).stringValue());
然后,您继续使用该密钥尝试更新任何匹配的现有文档。如果密钥错误,那么可能是文档更新将无声地失败。我怀疑toString()
上的Term
实际上只会给你一个Object引用,这意味着更新永远不会有效。
调用toString()
除了记录或调试之外的任何事情(即任何包含逻辑的东西)通常都是错误的。