在弹性搜索中使用电子邮件自动完成功能不起作用

时间:2016-11-28 14:39:48

标签: elasticsearch

我有一个定义了以下映射的字段:

"my_field": {
    "properties": {
        "address": {
            "type": "string",
            "analyzer": "email",
            "search_analyzer": "whitespace"
        }
    }
}

我的电子邮件分析器如下所示:

{
    "analysis": {
        "filter": {
            "email_filter": {
                "type": "edge_ngram",
                "min_gram": "3",
                "max_gram": "255"
            }
        },
        "analyzer": {
            "email": {
                "type": "custom",
                "filter": [
                    "lowercase",
                    "email_filter",
                    "unique"
                ],
                "tokenizer": "uax_url_email"
            }
        }
    }
}

当我尝试搜索电子邮件ID时,例如test.xyz@example.com

搜索tes,test.xy等术语并不起作用。但是,如果我搜索 test.xyz或test.xyz@example.com,它工作正常。我尝试使用我的电子邮件过滤器分析令牌,它按预期正常工作

实施例。点击http://localhost:9200/my_index/_analyze?analyzer=email&text=test.xyz@example.com

我明白了:

{
    "tokens": [{
        "token": "tes",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.x",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xy",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@e",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@ex",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@exa",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@exam",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@examp",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@exampl",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@example",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@example.",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@example.c",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@example.co",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@example.com",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }]
}

所以我知道令牌化是有效的。但是在搜索时,它无法搜索部分字符串。

对于前。正在寻找http://localhost:9200/my_index/my_field/_search?q=test,结果显示没有点击。

我的索引详情:

{
    "my_index": {
        "aliases": {
            "alias_default": {}
        },
        "mappings": {
            "my_field": {
                "properties": {
                    "address": {
                        "type": "string",
                        "analyzer": "email",
                        "search_analyzer": "whitespace"
                    },
                    "boost": {
                        "type": "long"
                    },
                    "createdat": {
                        "type": "date",
                        "format": "strict_date_optional_time||epoch_millis"
                    },
                    "instanceid": {
                        "type": "long"
                    },
                    "isdeleted": {
                        "type": "integer"
                    },
                    "object": {
                        "type": "string"
                    },
                    "objecthash": {
                        "type": "string"
                    },
                    "objectid": {
                        "type": "string"
                    },
                    "parent": {
                        "type": "short"
                    },
                    "parentid": {
                        "type": "integer"
                    },
                    "updatedat": {
                        "type": "date",
                        "format": "strict_date_optional_time||epoch_millis"
                    }
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1480342980403",
                "number_of_replicas": "1",
                "max_result_window": "100000",
                "uuid": "OUuiTma8CA2VNtw9Og",
                "analysis": {
                    "filter": {
                        "email_filter": {
                            "type": "edge_ngram",
                            "min_gram": "3",
                            "max_gram": "255"
                        },
                        "autocomplete_filter": {
                            "type": "edge_ngram",
                            "min_gram": "3",
                            "max_gram": "20"
                        }
                    },
                    "analyzer": {
                        "autocomplete": {
                            "type": "custom",
                            "filter": [
                                "lowercase",
                                "autocomplete_filter"
                            ],
                            "tokenizer": "standard"
                        },
                        "email": {
                            "type": "custom",
                            "filter": [
                                "lowercase",
                                "email_filter",
                                "unique"
                            ],
                            "tokenizer": "uax_url_email"
                        }
                    }
                },
                "number_of_shards": "5",
                "version": {
                    "created": "2010099"
                }
            }
        },
        "warmers": {}
    }
}

1 个答案:

答案 0 :(得分:1)

好的,除了你的查询,一切看起来都是正确的。

您只需在查询中指定address字段,就可以了:

http://localhost:9200/my_index/my_field/_search?q=address:test

如果您未指定address字段,则查询将在默认情况下搜索分析器为_all的{​​{1}}字段上运行,因此您找不到原因任何东西。