Question

例如，如果我有以下文件：

1. Casa Road
2. Jalan Casa

说我的查询字词是“cas”...在搜索时，两个文档都有相同的分数。我想要casa之前出现的那个（即此处的文档1）并在我的查询输出中排名第一。

我正在使用edgeNGram分析器。此外，我正在使用聚合，所以我不能使用查询后发生的正常排序。

Answer 1

这可能会涉及更多，但它应该有效。基本上，您需要文本本身中术语的位置，以及文本中术语的数量。实际评分是使用脚本计算的，因此您需要在elasticsearch.yml配置文件中enable dynamic scripting：

script.engine.groovy.inline.search: on

这就是你需要的：

使用term_vector设置为with_positions和edgeNGram以及类型为token_count的子字段的映射：

PUT /test
{
  "mappings": {
    "test": {
      "properties": {
        "text": {
          "type": "string",
          "term_vector": "with_positions",
          "index_analyzer": "edgengram_analyzer",
          "search_analyzer": "keyword",
          "fields": {
            "word_count": {
              "type": "token_count",
              "store": "yes",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "name_ngrams": {
          "min_gram": "2",
          "type": "edgeNGram",
          "max_gram": "30"
        }
      },
      "analyzer": {
        "edgengram_analyzer": {
          "type": "custom",
          "filter": [
            "standard",
            "lowercase",
            "name_ngrams"
          ],
          "tokenizer": "standard"
        }
      }
    }
  }
}

测试文件：

POST /test/test/1
{"text":"Casa Road"}
POST /test/test/2
{"text":"Jalan Casa"}

查询本身：

GET /test/test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "function_score": {
            "query": {
              "term": {
                "text": {
                  "value": "cas"
                }
              }
            },
            "script_score": {
              "script": "termInfo=_index['text'].get('cas',_POSITIONS);wordCount=doc['text.word_count'].value;if (termInfo) {for(pos in termInfo){return (wordCount-pos.position)/wordCount}};"
            },
            "boost_mode": "sum"
          }
        }
      ]
    }
  }
}

和结果：

   "hits": {
      "total": 2,
      "max_score": 1.3715843,
      "hits": [
         {
            "_index": "test",
            "_type": "test",
            "_id": "1",
            "_score": 1.3715843,
            "_source": {
               "text": "Casa Road"
            }
         },
         {
            "_index": "test",
            "_type": "test",
            "_id": "2",
            "_score": 0.8715843,
            "_source": {
               "text": "Jalan Casa"
            }
         }
      ]
   }

Answer 2

您可以使用Bool Query来增强以搜索查询开头的项目：

{
    "bool" : {
        "must" : {
            "match" : { "name" : "cas" }
        },
        "should" {
            "prefix" : { "name" : "cas" }
        },
    }
}

我假设您提供的值位于name字段中，并且未分析该字段。如果对其进行分析，可以查看此answer以获取更多想法。

它的工作方式是：

两个文档都将匹配must子句中的查询，并将获得相同的分数。如果文档与must查询不匹配，则
只有术语以cas开头的文档才会与should子句中的查询匹配，从而使其获得更高的分数。如果文档与should查询不匹配，则该文档不会被排除。

elasticsearch：如何将第一个出现的单词或短语排在更高的位置

2 个答案: