Question

我的要求是基于模糊匹配从弹性搜索中搜索文档，然后“重新调整”。通过比较文档的值和输入字符串的文档，例如，文档。如果查询返回3个文档（doc：1,2,3），那么为了比较常数值＆＃39;星球大战＆＃39;，比较应该如下：

doc:1, MovieName:"Star Wars" (compare ('Star Wars','Star Wars'))
doc:2, MovieName:"Starr Warz" (compare ('Star Wars','Starr Warz'))
doc:3, MovieName:"The Star Wars" (compare ('Star Wars','The Star Wars'))

我找到了以下elasticsearch rescore插件示例并实现了它以实现上述目的。 https://github.com/elastic/elasticsearch/blob/6.2/plugins/examples/rescore/src/main/java/org/elasticsearch/example/rescore/ExampleRescoreBuilder.java

我能够通过并访问输入“星球大战”＆＃39;在插件中，但是我在获取结果（topdocs）中返回的文档的MovieName字段的值时遇到了麻烦。

我的查询：

  GET movie-idx/_search?
    {
      "query": {
        "bool": {
          "must": [
            {
              "query_string": {
                "fields": [
                  "MovieName"
                ],
                "query": "Star Wars",
                "minimum_should_match": "61%",
                "fuzziness": 1,
                "_name": "fuzzy"
              }
            }
          ]
        }
      },
      "rescore": {
        "calculateMovieScore": {
          "MovieName": "Star Wars"
        }
      }
    }

我的rescorer课程看起来像：

private static class DocsRescorer implements Rescorer {
        private static final DocsRescorer INSTANCE = new DocsRescorer();

        @Override
        public TopDocs rescore(TopDocs topDocs, IndexSearcher searcher, RescoreContext rescoreContext) throws IOException {
            DocRescoreContext context = (DocRescoreContext) rescoreContext;
            int end = Math.min(topDocs.scoreDocs.length, rescoreContext.getWindowSize());

            MovieScorer MovieScorer = new MovieScorerBuilder()
                    .withInputName(context.MovieName)
                    .build();

            for (int i = 0; i < end; i++) {
                String name = <get MovieName values from actual document returned by topdocs>
                float score = MovieScorer.calculateScore(name);
                topDocs.scoreDocs[i].score = score;
            }

            List<ScoreDoc> scoreDocList =  Stream.of(topDocs.scoreDocs).filter((a) -> a.score >= context.threshold).sorted(
                    (a, b) -> {
                        if (a.score > b.score) {
                            return -1;
                        }
                        if (a.score < b.score) {
                            return 1;
                        }
                        // Safe because doc ids >= 0
                        return a.doc - b.doc;
                    }
            ).collect(Collectors.toList());
            ScoreDoc[] scoreDocs = scoreDocList.toArray(new ScoreDoc[scoreDocList.size()]);
            topDocs.scoreDocs = scoreDocs;
            return topDocs;
        }

        @Override
        public Explanation explain(int topLevelDocId, IndexSearcher searcher, RescoreContext rescoreContext,
                                   Explanation sourceExplanation) throws IOException {
            DocRescoreContext context = (DocRescoreContext) rescoreContext;
            // Note that this is inaccurate because it ignores factor field
            return Explanation.match(context.factor, "test", singletonList(sourceExplanation));
        }

        @Override
        public void extractTerms(IndexSearcher searcher, RescoreContext rescoreContext, Set<Term> termsSet) {
            // Since we don't use queries there are no terms to extract.
        }
    }

我的理解是插件代码将执行一次，它将获得topdocs作为初始查询的结果（在这种情况下为模糊搜索）和for（int i = 0; i＆lt; end; i ++）将循环结果中返回的每个文档。我需要帮助的地方是：

String name = <get MovieName value from actual document returned by topdocs>

Answer 1

我知道已经超过2年了，但是我遇到了同样的问题并找到了解决方案，所以我将其发布在这里。这是针对ES 7.8.0中的Rescorer插件完成的。我使用的基本示例是分组插件Link。

这是一堆我不完全理解的代码，但是主要原理是您需要要获取的字段的IFD（IndexFieldData <？>）实例。在我的示例中，我只需要点击的_id。看起来像这样：

预先准备IFD并将其传递给RescoreContext：在扩展RescoreContext的类中添加一个成员，以将该IFD保留在上下文中，将其称为“ idField”（在第3节中使用）。

@Override
public RescoreContext innerBuildContext(int windowSize, QueryShardContext queryShardContext) throws IOException {
    return new MyRescoreContext(windowSize, queryShardContext.getForField(queryShardContext.fieldMapper("_id")));
}

下一步，在Rescorer本身中：（方法rescore（...））

2.1）首先按scoreDoc.doc排序

 ScoreDoc[] hits = topDocs.scoreDocs; 
 Arrays.sort(hits, Comparator.comparingInt((d) -> d.doc));

2.2）执行黑色魔术（我不明白的代码）

List<LeafReaderContext> readerContexts = searcher.getIndexReader().leaves();
int currentReaderIx = -1;
int currentReaderEndDoc = 0;
LeafReaderContext currentReaderContext = null;
    
for (int i = 0; i < end; i++) {
ScoreDoc hit = hits[i];
    
    // find segment that contains current document 
   while (hit.doc >= currentReaderEndDoc) {  
      currentReaderIx++;
      currentReaderContext = readerContexts.get(currentReaderIx);
      currentReaderEndDoc = currentReaderContext.docBase + currentReaderContext.reader().maxDoc();
   }

   int docId = hit.doc - currentReaderContext.docBase;

   // code from section 3 goes here //
}

现在，有了这个神奇的“ docId”，您可以在For循环内的IFD中进行获取：

 SortedBinaryDocValues values = rescoreContext.idField.load(currentReaderContext).getBytesValues();
 values.advanceExact(docId);
 String id = values.nextValue().utf8ToString();

在您的情况下，

而不是_id字段，获取所需字段的IFD，并从For循环内的docId->字符串值创建一个Hashmap。然后在您应用得分的同一For循环中使用此地图。

希望这对所有人都有帮助！根本没有记录该技术，并且在任何地方都没有解释！

在弹性搜索插件中获取结果返回的文档的字段值

1 个答案: