Question

我们正在创建一个Spring-MVC应用程序，我们使用Lucene进行文本索引和搜索。我正在保存Object的ID以便稍后检索关联的Java Object。如何获取具有已保存ID的文档并更新手动设置的字段。我知道如何搜索给定的文本，但没有具体的单个文档。谢谢。

保存代码：

// retrieve from the ID below
            doc.add(new StringField("id", String.valueOf(objectId), Field.Store.YES));
// Update the Integer count below
 LegacyIntField intField = new LegacyIntField("score",0,Field.Store.YES);
                    intField.setIntValue(1);
                    doc.add(intField);

当前更新代码：

 Path path = Paths.get(OUR_PATH);
                    Directory index_dir = FSDirectory.open(path);

                    IndexWriter writer = new IndexWriter(index_dir, new IndexWriterConfig(new StandardAnalyzer()));
                    IndexReader reader = DirectoryReader.open(writer);

谢谢。

Answer 1

首先，lucene不支持更新单个字段，因此尝试隔离和优化单个字段的更新过程没有任何好处。

基本上你正在寻找的是一种方法：

加载以前编入索引的原始文档（这不是开箱即用的lucene）
以及更新现有文档的方法（这是IndexWriter.updateDocument）

如果lucene索引不是主数据存储区，则应使用主数据存储区来获取文档设置新的vaue，然后使用伪代码重新索引整个文档：

public void updateField(String docId, int newFieldvalue) {
    MyDataObject data = primaryDataStore.fetch(docId);
    data.setFieldValue(newFieldValue);
    primaryDataStore.save(data);
    updateIndex(data);
}

public void updateIndex(MyDataObject object) {
    // convertToLucene is more or less the code in the
    // first snippet of your question 
    Document d = convertToLucene(object);
    // IndexWriter should be created once
    // IndexWriter.updateDocument will internally delete and index 
    // the document
    this.writer.updateDocument(new Term("id", object.getId()), d);
    // potentially call writer.commit()
}

如果lucene是您的主要数据存储区，它会更复杂，我强烈建议（如果不是太晚）使用solr或elasticsearch，它提供了一个很好的REST API，使lucene更像文档存储。你必须考虑到lucene不是开箱即用的＃34;文档数据存储区。如果您想使用lucene作为主数据存储区，您可以使用您选择的格式（JSON，二进制序列化......）将文档存储在存储的字段中。

要获取文档，您必须在字段＆＃34; id＆＃34;上执行搜索查询。您使用TermQuery创建，使用收集器或TodDocs，然后在IndexReader或IndexSearcher上调用document（int luceneDocId）来获取存储的字段，使用伪代码（替换primaryDataStore.fetch(docId)中使用的方法上一个片段）：

public MyDataObject fetchFromLucene(String docId) {
     IndexSearcher searcher = getSearcher();
     TopDocs docs = searcher.search(new TermQuery(new Term("id", docId)), 1);
     if (docs.totalHits > 0) {
         Document d = searcher.document(docs.scoreDocs[0].doc);
         // "document_data" is a binary field you'll have to add
         // on every lucene docs where you put a serialized version
         // of your domain object.
         return deserialize( d.getBinaryValue("document_data") );
     }
     return null;
}

public MyDataObject deserialize(ByteRef data) {
    // a method to deserialize binary data into MyDataObject
    return deserializedData;
}

简而言之，如果您想直接将lucene作为主数据存储区处理，那么您最终会编写大量的样板代码。请注意，您必须自己管理许多低级的lucene方面，例如以高效的方式刷新IndexReaders。

Java，Lucene：通过保存的Id获取文档，然后更新其中一个字段

1 个答案: