为什么我的弹性搜索查询不返回英文分析器分析的文本?

时间:2015-08-03 11:40:08

标签: elasticsearch

我有一个名为test_blocks

的索引
{
  "test_blocks" : {
    "aliases" : { },
    "mappings" : {
      "block" : {
        "dynamic" : "false",
        "properties" : {
          "content" : {
            "type" : "string",
            "fields" : {
              "content_en" : {
                "type" : "string",
                "analyzer" : "english"
              }
            }
          },
          "id" : {
            "type" : "long"
          },
          "title" : {
            "type" : "string",
            "fields" : {
              "title_en" : {
                "type" : "string",
                "analyzer" : "english"
              }
            }
          },
          "user_id" : {
            "type" : "long"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1438642440687",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "version" : {
          "created" : "1070099"
        },
        "uuid" : "45vkIigXSCyvHN6g-w5kkg"
      }
    },
    "warmers" : { }
  }
}

当我搜索内容中的单词killing时,搜索结果会按预期返回。

http://localhost:9200/test_blocks/_search?q=killing&pretty=1


{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.07431685,
    "hits" : [ {
      "_index" : "test_blocks",
      "_type" : "block",
      "_id" : "218",
      "_score" : 0.07431685,
      "_source":{"block":{"id":218,"title":"The \u003ci\u003eparticle\u003c/i\u003e streak","content":"Barry Allen is a Central City police forensic scientist\n                        with a reasonably happy life, despite the childhood\n                        trauma of a mysterious red and yellow being killing his\n                        mother and framing his father. All that changes when a\n                        massive \u003cb\u003eparticle\u003c/b\u003e accelerator accident leads to Barry\n                        being struck by lightning in his lab.","user_id":82}}
    }, {
      "_index" : "test_blocks",
      "_type" : "block",
      "_id" : "219",
      "_score" : 0.07431685,
      "_source":{"block":{"id":219,"title":"The \u003ci\u003eparticle\u003c/i\u003e streak","content":"Barry Allen is a Central City police forensic scientist\n                        with a reasonably happy life, despite the childhood\n                        trauma of a mysterious red and yellow being killing his\n                        mother and framing his father. All that changes when a\n                        massive \u003cb\u003eparticle\u003c/b\u003e accelerator accident leads to Barry\n                        being struck by lightning in his lab.","user_id":83}}
    } ]
  }
}

但是,鉴于我的内容字段(content_en)english分析器,我原本希望它为查询kill返回相同的文档。但事实并非如此。我得到0次点击。

http://localhost:9200/test_blocks/_search?q=kill&pretty=1

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

通过这个分析查询我的理解是“杀戮”会被分解为“杀死”

http://localhost:9200/_analyze?analyzer=english&text=killing

{
  "tokens" : [ {
    "token" : "kill",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

那么为什么查询“kill”不匹配该文档?我的映射是不正确还是我的搜索不正确?

我正在使用elasticsearch v1.7.0

2 个答案:

答案 0 :(得分:2)

您需要使用fuzzysearch(可用的一些介绍here):

curl -XPOST 'http://localhost:9200/test_blocks/_search' -d '
{
  "query": {
    "match": {
      "title": {
        "query": "kill",
        "fuzziness": 2,
        "prefix_length": 1
      }
    }
  }
}'

<强> UPD 即可。如果content_en字段包含由stemmer提供的内容,则实际查询该字段是有意义的:

curl -XPOST 'http://localhost:9200/test_blocks/_search' -d '
{
  "query": {
    "multi_match": {
      "type": "most_fields",
      "query": "kill",
      "fields": ["block.title", "block.title.title_en"]
    }
  }
}'

答案 1 :(得分:1)

以下查询http://localhost:9200/_search?q=kill.http://localhost:9200/_search?q=kill.最终会进行搜索 _all字段。

_all字段使用默认分析器,除非被覆盖,否则恰好是标准分析器而不是英文分析器

为了使上述查询工作,你需要将英文分析器添加到 _all 字段并重新索引 例如:

    {
      "mappings": {
        "block": {
            "_all" : {"analyzer" : "english"}
       }
   }

还要指出OP中的映射似乎与文档结构不一致。由于@EugZol指出我们的内容在块对象中,因此映射应该是这些行上的内容:

{
  "mappings": {
    "block": {
      "properties": {
        "block": {
          "properties": {
            "content": {
              "type": "string",
              "analyzer": "standard",
              "fields": {
                "content_en": {
                  "type": "string",
                  "analyzer": "english"
                }
              }
            },
            "id": {
              "type": "long"
            },
            "title": {
              "type": "string",
              "analyzer": "standard",
              "fields": {
                "title_en": {
                  "type": "string",
                  "analyzer": "english"
                }
              }
            },
            "user_id": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}