为什么在Elasticsearch中的以下文档中查询“ apache”不起作用?

时间:2019-01-21 19:48:32

标签: elasticsearch

我有一个简单的文本文档,其内容如下:curl -X GET "localhost:9200/customer/_doc/1"

{"_index":"customer","_type":"_doc","_id":"1","_version":1,"found":true,"_source":
{
  "description": "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model."
}
}

当我用下面提到的查询弹性查询对上述文档进行查询时,没有找到任何匹配项,我想知道为什么吗?

{
    "query": {
        "match" : {
            "description": "apache"
        }
    }
}

,如果我用createXmlDocumentorg.apache.crimson.tree.XmlDocument替换apache,则此查询成功。我最初的理解是org.apache.crimson.tree.XmlDocument将被分为5个字org,apache,crimson,tree和XmlDocument,但目前我想也许是整个org.apache.crimson.tree.XmlDocument被存储了就像通过弹性搜索一样。如果是这样,为什么以及如何获得期望的结果?

1 个答案:

答案 0 :(得分:3)

如果您未定义任何内容,则将使用standard analyzer

标准分析器将创建此令牌:

{
  "token" : "org.apache.crimson.tree.xmldocument",
  "start_offset" : 140,
  "end_offset" : 175,
  "type" : "<ALPHANUM>",
  "position" : 22
}

因此您的搜索找不到任何内容。如果您使用Pattern Analyzer,则将创建令牌apache。默认模式\W+(每个字词)都对您有用。

您可以通过以下方式进行检查

curl -XGET "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'
{
  "text": "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model.",
  "analyzer": "pattern"
}'

为索引定义一个明确的映射,如下所示:

PUT customer
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_doc": {
      "properties": {
        "description": {
          "type": "text",
          "analyzer": "pattern"
        }
      }
    }
  }
}

如果再次运行查询,您将获得例如:

  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "customer",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "description" : "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model."
        }
      }
    ]
  }