短语建议与Elasticsearch 5.5中的语音插件一起使用

时间:2018-07-20 19:29:32

标签: elasticsearch elasticsearch-5 elasticsearch-plugin elasticsearch-phonetic

我正在Web应用程序中开发搜索功能,并且正在使用Elasticsearch 5.5.3(我不能使用较新的版本,客户端要求是5.5.3)。

我也在使用短语建议功能。现在,它仅在给定字符串上搜索,但是如果有建议字符串可用,它将用建议字符串替换搜索(基本上在搜索中不包括其他字符串)。

问题是,我正在尝试使用语音插件来建议具有相似发音的相似字符串(例如,如果有人搜索“ Shile”,则应该建议“ Chile”,因为它们在西班牙语中的发音相同)。但是我还没有实现,我不确定自己做错了什么。

这是我尝试实现之前的映射(尽管其中包括我为此创建的分析器“ metaphone_analyzer”)

{
    "searches": {
        "aliases": {},
        "mappings": {
            "document": {
                "properties": {
                    "description": {
                        "type": "text",
                        "store": true
                    },
                    "detail": {
                        "type": "text",
                        "store": true,
                        "copy_to": [
                            "suggest_field"
                        ]
                    },
                    "language_code": {
                        "type": "text"
                    },
                    "name": {
                        "type": "text",
                        "store": true,
                        "copy_to": [
                            "suggest_field"
                        ]
                    },
                    "suggest_field": {
                        "type": "text"
                    },
                    "title": {
                        "type": "text",
                        "store": true,
                        "copy_to": [
                            "suggest_field"
                        ]
                    }
                }
            }
        },
        "settings": {
            "index": {
                "number_of_shards": "5",
                "provided_name": "searches",
                "creation_date": "1532112095449",
                "analysis": {
                    "filter": {
                        "shingle_filter": {
                            "max_shingle_size": "3",
                            "min_shingle_size": "2",
                            "type": "shingle"
                        },
                        "dbl_metaphone": {
                            "type": "phonetic",
                            "encoder": "double_metaphone"
                        },
                        "es_filter": {
                            "type": "stop",
                            "stopwords": "_spanish_"
                        },
                        "en_filter": {
                            "type": "stop",
                            "stopwords": "_english_"
                        }
                    },
                    "analyzer": {
                        "default": {
                            "filter": [
                                "lowercase",
                                "asciifolding",
                                "shingle_filter"
                            ],
                            "char_filter": [
                                "html_strip"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "en_analyzer": {
                            "filter": [
                                "lowercase",
                                "asciifolding",
                                "en_filter"
                            ],
                            "char_filter": [
                                "html_strip"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "es_analyzer": {
                            "filter": [
                                "lowercase",
                                "asciifolding",
                                "es_filter"
                            ],
                            "char_filter": [
                                "html_strip"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "metaphone_analyzer": {
                            "filter": [
                                "lowercase",
                                "asciifolding",
                                "shingle_filter",
                                "dbl_metaphone"
                            ],
                            "char_filter": [
                                "html_strip"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "SDykW954Q3egFIJEi1aA5w",
                "version": {
                    "created": "5050399"
                }
            }
        }
    }
}

这是我的查询:

{
    "suggest": {
        "suggestion": {
            "text": "search text",
            "phrase": {
                "max_errors": 2,
                "field": "suggest_field",
                "direct_generator": [
                    {
                        "suggest_mode": "missing",
                        "field": "suggest_field"
                    }
                ],
                "gram_size": 3,
                "size": 1,
                "highlight": {
                    "pre_tag": "<strong>",
                    "post_tag": "</strong>"
                }
            }
        }
    },
    "highlight": {
        ...
    },
    "query": {
        ...
    }
}

基本上,我将一些字段复制到“ suggest_field”字段中,因此我可以一次在多个字段上使用建议。

那么,我应该前进的任何想法或方向?

谢谢。

PS:出于记录,这是一个Django Web应用程序,我使用的是elasticsearch-dsl,但是为了简单起见,我在这里编写了最终查询。

0 个答案:

没有答案