在Elasticsearch中查询全名的最佳配置是什么?

时间:2019-02-15 15:52:28

标签: elasticsearch

我正在尝试在Elasticsearch中设置搜索分析,并且尝试了很多组合,但均未成功,现在我不知道是否有可能:

假设我有3个具有以下全名的用户

  • John Doe
  • Johnatan Lebus
  • 简·多伊

键入:

  • Jo应该给 John Doe Johnatan Lebus
  • Ja应该给 Jane Doe
  • doe应该给 Jane Doe John Doe
  • doe john应该只给出 John Doe ,而不是 Jane Doe

最后一种情况可能是什么,应该是什么配置?

实际上我有这个:

 "analysis": {
                    "analyzer": {
                        "keyword_analyzer": {
                            "char_filter\"": [],
                            "filter": [
                                "lowercase",
                                "asciifolding",
                                "trim"
                            ],
                            "type": "custom",
                            "tokenizer": "keyword"
                        },
                        "edge_ngram_analyzer": {
                            "filter": [
                                "lowercase"
                            ],
                            "tokenizer": "edge_ngram_tokenizer"
                        },
                        "edge_ngram_search_analyzer": {
                            "tokenizer": "lowercase"
                        }
                    },
                    "tokenizer": {
                        "edge_ngram_tokenizer": {
                            "token_chars": [
                                "letter"
                            ],
                            "min_gram": "2",
                            "type": "edge_ngram",
                            "max_gram": "5"
                        }
                    }
                },

谢谢

1 个答案:

答案 0 :(得分:1)

我绝对认为您的分析器可能适合您的用例,我怀疑您在查询时需要帮助。

我使用您的分析仪设置了索引,并使用它创建了一个字段:

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "keyword_analyzer": {
          "char_filter\"": [],
          "filter": [
            "lowercase",
            "asciifolding",
            "trim"
          ],
          "type": "custom",
          "tokenizer": "keyword"
        },
        "edge_ngram_analyzer": {
          "filter": [
            "lowercase"
          ],
          "tokenizer": "edge_ngram_tokenizer"
        },
        "edge_ngram_search_analyzer": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "edge_ngram_tokenizer": {
          "token_chars": [
            "letter"
          ],
          "min_gram": "2",
          "type": "edge_ngram",
          "max_gram": "5"
        }
      }
    }
  },
  "mappings": {
    "test_doc": {
      "properties": {
        "full_name": {
          "type": "text",
          "analyzer": "edge_ngram_analyzer"
        }
      }
    }
  }
}

然后我为一些文档建立索引:

PUT test/test_doc/1
{
  "full_name": "John Doe"
}

PUT test/test_doc/2
{
  "full_name": "Jane Doe"
}

PUT test/test_doc/3
{
  "full_name": "Johnatan Lebus"
}

然后,我将以下查询用作您的最后一种情况。

GET test/_search
{
  "query": {
    "match": {
      "full_name": {
        "operator": "and",
        "query": "doe john"
      }
    }
  }
}

使用上面的任何文本替换“查询”字段,即可获得所需的结果。解决您问题的真正“解决方案”是在查询时更具创造力,尽管从令牌的角度看似乎不可能。

希望这会有所帮助!