无法在elasticsearch索引中的字符串中搜索字符串

时间:2012-12-07 09:40:10

标签: elasticsearch

我正在尝试使用全名匹配和部分名称匹配来设置我的elasticsearch实例的映射:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '{
  "mappings": {
    "venue": {
      "properties": {
        "location": {
          "type": "geo_point"
        },
        "name": {
          "fields": {
            "name": {
              "type": "string",
              "analyzer": "full_name"
            },
            "partial": {
              "search_analyzer": "full_name",
              "index_analyzer": "partial_name",
              "type": "string"
            }
          },
          "type": "multi_field"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "swedish_snow": {
          "type": "snowball",
          "language": "Swedish"
        },
        "name_synonyms": {
          "type": "synonym",
          "synonyms_path": "name_synonyms.txt"
        },
        "name_ngrams": {
          "side": "front",
          "min_gram": 2,
          "max_gram": 50,
          "type": "edgeNGram"
        }
      },
      "analyzer": {
        "full_name": {
          "filter": [
            "standard",
            "lowercase"
          ],
          "type": "custom",
          "tokenizer": "standard"
        },
        "partial_name": {
          "filter": [
            "swedish_snow",
            "lowercase",
            "name_synonyms",
            "name_ngrams",
            "standard"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  }
}'

我填写了一些数据:

curl -XPOST 'http://127.0.0.1:9200/_bulk?pretty=1'  -d '
{"index" : {"_index" : "test", "_type" : "venue"}}
{"location" : [59.3366, 18.0315], "name" : "johnssons"}
{"index" : {"_index" : "test", "_type" : "venue"}}
{"location" : [59.3366, 18.0315], "name" : "johnsson"}
{"index" : {"_index" : "test", "_type" : "venue"}}
{"location" : [59.3366, 18.0315], "name" : "jöhnsson"}
'

执行一些搜索测试, 全名:

curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{
  "query": {
    "bool": {
      "should": [
        {
          "text": {
            "name": {
              "boost": 1,
              "query": "johnsson"
            }
          }
        },
        {
          "text": {
            "name.partial": "johnsson"
          }
        }
      ]
    }
  }
}'

结果:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.29834434,
    "hits": [
      {
        "_index": "test",
        "_type": "venue",
        "_id": "CAO-dDr2TFOuCM4pFfNDSw",
        "_score": 0.29834434,
        "_source": {
          "location": [
            59.3366,
            18.0315
          ],
          "name": "johnsson"
        }
      },
      {
        "_index": "test",
        "_type": "venue",
        "_id": "UQWGn8L9Squ5RYDMd4jqKA",
        "_score": 0.14663845,
        "_source": {
          "location": [
            59.3366,
            18.0315
          ],
          "name": "johnssons"
        }
      }
    ]
  }
}

部分名称:

curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{
  "query": {
    "bool": {
      "should": [
        {
          "text": {
            "name": {
              "boost": 1,
              "query": "johns"
            }
          }
        },
        {
          "text": {
            "name.partial": "johns"
          }
        }
      ]
    }
  }
}'

结果:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.14663845,
    "hits": [
      {
        "_index": "test",
        "_type": "venue",
        "_id": "UQWGn8L9Squ5RYDMd4jqKA",
        "_score": 0.14663845,
        "_source": {
          "location": [
            59.3366,
            18.0315
          ],
          "name": "johnssons"
        }
      },
      {
        "_index": "test",
        "_type": "venue",
        "_id": "CAO-dDr2TFOuCM4pFfNDSw",
        "_score": 0.016878016,
        "_source": {
          "location": [
            59.3366,
            18.0315
          ],
          "name": "johnsson"
        }
      }
    ]
  }
}

名称中的名称:

curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{
  "query": {
    "bool": {
      "should": [
        {
          "text": {
            "ame": {
              "boost": 1,
              "query": "johnssons"
            }
          }
        },
        {
          "text": {
            "name.partial": "johnssons"
          }
        }
      ]
    }
  }
}'

结果:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.39103588,
    "hits": [
      {
        "_index": "test",
        "_type": "venue",
        "_id": "UQWGn8L9Squ5RYDMd4jqKA",
        "_score": 0.39103588,
        "_source": {
          "location": [
            59.3366,
            18.0315
          ],
          "name": "johnssons"
        }
      }
    ]
  }
}

正如你所看到的,我只回到johnssons。我不应该同时回到johnssonsjohnsson吗?在我的设置中我做错了什么?

1 个答案:

答案 0 :(得分:2)

您正在使用full_name分析为name.partial字段的搜索分析器。因此,您的查询将被翻译为术语johnssons的查询,该查询与任何内容都不匹配。

您可以使用Analyze API查看记录的索引方式。例如,这个命令

curl -XGET 'http://127.0.0.1:9200/test/_analyze?analyzer=partial_name&pretty=1' -d 'johnssons'

将告诉您在索引过程中字符串“johnssons”将被翻译成以下术语:“jo”,“joh”,“john”,“johns”,“johnss”,“johnsso”,“johnsson”。虽然这个命令

 curl -XGET 'http://127.0.0.1:9200/test/_analyze?analyzer=full_name&pretty=1' -d 'johnssons'

将告诉你在搜索过程中字符串“johnssons”被翻译成术语“johnssons”。如您所见,您的搜索字词与数据之间不匹配。