Question

我花了很多时间试图找到支持多国语言城市的最佳方式来创建和自动完成。（ES / EN），模糊性和完全匹配的优先级（在结果顶部显示），但我找不到完成此任务的好方法。

我当前的解决方案在很多情况下都可以很好地工作，但是当我找到Roma时，第一个选择是“ Iasi-East Romania，romania”，Roma Italy是30个函数（完全匹配）

结果杰森：

<div class="wrapper">
  <input type="range" min="1" data-whatever="size" max="800" value="50" id="sliderSize">
  <em>50</em>
  <span>Size</span>
  <br>
  <input type="range" min="1" data-whatever="OriginY" max="800" value="50" id="sliderOriginY">
  <em>50</em>
  <span>OriginY</span>
  <br>
  <input type="range" min="1" data-whatever="OriginX" max="800" value="50" id="sliderOriginX">
  <em>50</em>
  <span>OriginX</span>
</div>

现在这是我最好的解决方法。

映射：

[{"_index":"destinations","_type":"doc","_id":"_X80XWcBn2nzTu98N7_F","_score":75.50012,"_source":{"destination_name_en":"Iasi-East Romania","destination_name_es":"Iasi-East Romania","destination_name_pt":"Iasi-East Romania","country_code":"RO","country_name":"ROMANIA","destination_id":7953,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"7380XWcBn2nzTu98OMZl","_score":73.116455,"_source":{"destination_name_en":"La Romana","destination_name_es":"La Romana","destination_name_pt":"La Romana","country_code":"DO","country_name":"DOMINICAN REPUBLIC","destination_id":2816,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"1X80XWcBn2nzTu98OMZl","_score":71.4391,"_source":{"_index":"destinations","_type":"doc","_id":"8H80XWcBn2nzTu98OMZl","_score":52.018818,"_source":{"destination_name_en":"Rome","destination_name_es":"Roma","destination_name_pt":"Roma","country_code":"IT","country_name":"ITALY","destination_id":6338,"popularity":"0"}}]

搜索：

'settings' => [ 
                'analysis' => [     
                    'filter' => [
                        'autocomplete_filter' => [
                            "type"=> "edge_ngram",
                            "min_gram"=> 1,
                            "max_gram"=> 20,

                        ]
                    ],
                    'analyzer' => [
                        'autocomplete' => [
                            "type" => "custom",
                            'tokenizer' => "standard",
                            'filter' => ['lowercase', 'asciifolding', 'autocomplete_filter'],
                        ]
                    ],

                ],   
            ],
            'mappings' =>[
                'doc' => [
                    "properties"=> [
                        "destination_name_en"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",

                        ],
                        "destination_name_es"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",
                        ],
                        "destination_name_pt"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",
                        ],
                        "popularity"=> [
                           "type"=> "integer",
                        ]
                    ]
                ]
            ]

此外，我想使用她的人气值来增加特定目的地的吸引力。

我希望有人可以向我提供示例或前进方向的指导。

我会很感激

Answer 1

问题在于，当您搜索roma时，Iasi-East Romania是第一个结果，因为它包含所有语言的罗马字母。但是roma仅与ES / PT / IT中的Rome匹配，而与EN不匹配。

因此，如果您想增强精确匹配，则需要在另一个字段中索引城市名称而无需自动填充（适用于所有语言），并在这些字段的should中添加新的子句。

映射示例：

 "properties"=> [
        "destination_name_en"=> [
                "type"=> "text",
                "analyzer"=> "autocomplete",
                "search_analyzer"=> "standard",
                "fields": => [
                    "exact" => [
                        "type"=> "text",
                        "analyzer"=> "standard", // you could use a more fancy analyzer here
                    ]

                ]
        ],
....

，并在查询中：

'query' => [
                "bool" => [
                    "should" => [   
                         [
                            "multi_match"=>[
                                "query"=>$text,
                                "fields"=>[
                                   "destination_name_*"
                                ],
                                "type"=>"most_fields",
                                "boost" => 2
                            ]
                        ],
                        [
                            "multi_match"=>[
                                "query"=>$text,
                                "fields"=>[
                                   "destination_name_*"
                                ],
                                "fuzziness" => "1",
                                "prefix_length"=> 2                                   
                            ]
                        ],
                        [
                            "multi_match"=>[
                                "query"=>$text,
                                "type"=>"most_fields" 
                                "fields"=>[
                                   "destination_name_*.exact"
                                ],
                                "boost" => 2 
                            ]
                        ]
                    ]
                ]
            ]

您能尝试类似的方法并保持我们的发布状态吗？

Answer 2

这件作品令人着迷！现在，我可以获得第一个结果中的罗马字，并且在单词结尾处也接受错误。罗米在第一个结果中也返回罗马。

现在，我正在尝试通过受欢迎程度提高结果（我有两个罗马，罗马-意大利和罗马-澳大利亚），而且我想在世界上增加一些受欢迎的城市。

我正在使用功能评分，但这使我感到非常奇怪。

这是我当前的代码：

'query' => [
                'function_score' => [
                    'field_value_factor' => [
                        'field' => 'popularity',
                    ],
                    "score_mode" => "multiply",
                    'query' => [
                        "bool" => [
                            "should" => [   
                                 [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "type"=>"most_fields",
                                        "boost" => 2
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "fuzziness" => "1",
                                        "prefix_length"=> 2                                   
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*.exact"
                                        ],
                                        "boost" => 2                                   
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
            ],

有没有建议？

PD：非常感谢您的帮助。从现在开始，我给您最好的答案，因为您已经解决了主要问题

精确匹配和模糊性...什么是好方法？

2 个答案: