Question

使用案例 我有一个companies的集合。每家公司都有city和country的信息。我希望能够进行文本搜索，以找到曼谷 - 泰国的公司。所有信息必须以不同语言搜索。例：在巴西，大多数人使用英文版本的曼谷，而不是Banguecoque作为巴西版本。在这种情况下，如果一个人想要搜索曼谷 - 泰国的公司，搜索句子将是bangkok tailandia。由于这个要求，我必须能够搜索不同的语言字段来检索结果。

问题： 在未指定分析器的情况下发送查询时，Elasticsearch使用在每个字段配置中指定的search_analyzer。问题是它打破了跨领域查询的目的。这是分析仪配置：

"query_analyzer_en": {
    "type": "custom",
    "tokenizer": "standard",
    "filter": [ "lowercase", "asciifolding", "stopwords_en" ]
},
"query_analyzer_pt": {
    "type": "custom",
    "tokenizer": "standard",
    "filter": [ "lowercase", "asciifolding", "stopwords_pt" ]
}

每个分析器都按语言使用不同的stop过滤器。

这是字段配置：

"dynamic_templates": [{
    "english": {
        "match": "*_txt_en",
        "match_mapping_type": "string",
        "mapping": {
            "type": "string",
            "analyzer": "index_analyzer_en",
            "search_analyzer": "query_analyzer_en"
        }
    }
}, {
    "portuguese": {
        "match": "*_txt_pt",
        "match_mapping_type": "string",
        "mapping": {
            "type": "string",
            "analyzer": "index_analyzer_pt",
            "search_analyzer": "query_analyzer_pt"
        }
    }
}]

这是我正在使用的查询：

{
   "query": {
      "multi_match" : {
        "query" : "bangkok tailandia",
        "type"  : "cross_fields",
        "operator":   "and",
        "fields" : [ "city_txt_en", "country_txt_pt" ],
        "tie_breaker": 0.0
      }
   },
   "profile": true
}

在分析查询后，结果为：

(+city_txt_en:bangkok +city_txt_en:tailandia) 
(+country_txt_pt:bangkok +country_txt_pt:tailandia)

它无法正常工作，因为Elasticsearch正在尝试匹配city和country字段中的两个字词。问题是 bangkok一词是英文的，而 tailandia 一词是葡萄牙语。

如果我在查询上设置了分析器，那么lucene查询就是我期望的方式：

+(city_txt_en:bangkok | country_txt_pt:bangkok) 
+(city_txt_en:tailandia | country_txt_pt:tailandia)

但现在问题是我必须对两种语言使用相同的查询分析器。我需要一种方法来使用不同的查询分析器按语言生成上面的lucene查询。

Answer 1

您应该可以使用$str2实现此目的。查询字符串会中断术语，然后根据分析器将它们应用于每个字段。例如：

[query_string][1]

Answer 2

根据文档cross_fields mandates that all fields have the same analyzer

然而，您可以做的是将查询分成两部分，例如每个部分具有相同的匹配机会。在这里，您可以使用match，因为每个multi_match都有一个字段，但您也可以在每个子查询中添加具有相同分析器的其他字段

{
    "bool": {
        "should": [
            {
              "multi_match" : {
                "query" : "bangkok tailandia",
                "type":       "cross_fields",
                "operator":   "and",
                "fields" : [ "city_txt_en" ],
                "minimum_should_match": "50%" 
              }
            },
            {
              "multi_match" : {
                "query" : "bangkok tailandia",
                "type":       "cross_fields",
                "operator":   "and",
                "fields" : [ "country_txt_pt" ]
              }
            }
        ]
    }
}

Elasticsearch多匹配交叉字段查询与不同的查询分析器

2 个答案: