我有一个简单的文本文档,其内容如下:curl -X GET "localhost:9200/customer/_doc/1"
{"_index":"customer","_type":"_doc","_id":"1","_version":1,"found":true,"_source":
{
"description": "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model."
}
}
当我用下面提到的查询弹性查询对上述文档进行查询时,没有找到任何匹配项,我想知道为什么吗?
{
"query": {
"match" : {
"description": "apache"
}
}
}
,如果我用createXmlDocument
或org.apache.crimson.tree.XmlDocument
替换apache,则此查询成功。我最初的理解是org.apache.crimson.tree.XmlDocument将被分为5个字org,apache,crimson,tree和XmlDocument,但目前我想也许是整个org.apache.crimson.tree.XmlDocument被存储了就像通过弹性搜索一样。如果是这样,为什么以及如何获得期望的结果?
答案 0 :(得分:3)
如果您未定义任何内容,则将使用standard analyzer。
标准分析器将创建此令牌:
{
"token" : "org.apache.crimson.tree.xmldocument",
"start_offset" : 140,
"end_offset" : 175,
"type" : "<ALPHANUM>",
"position" : 22
}
因此您的搜索找不到任何内容。如果您使用Pattern Analyzer,则将创建令牌apache
。默认模式\W+
(每个字词)都对您有用。
您可以通过以下方式进行检查
curl -XGET "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'
{
"text": "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model.",
"analyzer": "pattern"
}'
为索引定义一个明确的映射,如下所示:
PUT customer
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"description": {
"type": "text",
"analyzer": "pattern"
}
}
}
}
}
如果再次运行查询,您将获得例如:
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"description" : "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model."
}
}
]
}