我有一个弹性搜索索引,其中包含完全匹配的字段,不知怎的,我得到了很多类似的结果(我不介意)和那些类似的结果在完全匹配之前排序,(我这样做)心。)
有人可以解释发生了什么以及如何解决这个问题吗?
我的映射就像这样
"exact":{
"type":"string",
"boost":10.0,
"analyzer":"keyword"
},
我搜索“AAPL P JAN 2014 885,00”的查询是这样的:
{
"size" : 21,
"query" : {
"field" : {
"exact" : "AAPL P JAN 2014 885,00"
}
},
"explain" : true,
"sort" : [ {
"_score" : {
"order" : "desc"
}
} ],
"facets" : {
"category" : {
"terms" : {
"field" : "category",
"size" : 10
}
}
}
}
返回的文件按此顺序结束:
等,完全匹配了一堆结果。
有人可以向我解释为什么完全匹配不会结束吗?
如果它有助于理解事物,那么完整解释的搜索结果如下。
"hits" : [ {
"_shard" : 0,
"_node" : "1",
"_index" : "instruments",
"_type" : "instrument",
"_id" : "AAPL",
"_score" : 1306.8339, "_source" : {"exact":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"},
"_explanation" : {
"value" : 1306.8339,
"description" : "product of:",
"details" : [ {
"value" : 6534.169,
"description" : "sum of:",
"details" : [ {
"value" : 6534.169,
"description" : "weight(exact:AAPL in 9096), product of:",
"details" : [ {
"value" : 0.25854474,
"description" : "queryWeight(exact:AAPL), product of:",
"details" : [ {
"value" : 6.1701355,
"description" : "idf(docFreq=211, maxDocs=37299)"
}, {
"value" : 0.0419026,
"description" : "queryNorm"
} ]
}, {
"value" : 25272.875,
"description" : "fieldWeight(exact:AAPL in 9096), product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(exact:AAPL)=1)"
}, {
"value" : 6.1701355,
"description" : "idf(docFreq=211, maxDocs=37299)"
}, {
"value" : 4096.0,
"description" : "fieldNorm(field=exact, doc=9096)"
} ]
} ]
} ]
}, {
"value" : 0.2,
"description" : "coord(1/5)"
} ]
}
}, {
"_shard" : 0,
"_node" : "1",
"_index" : "instruments",
"_type" : "instrument",
"_id" : "AAPL*PUT*20140118*675",
"_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"},
"_explanation" : {
"value" : 163.35423,
"description" : "product of:",
"details" : [ {
"value" : 816.7711,
"description" : "sum of:",
"details" : [ {
"value" : 816.7711,
"description" : "weight(exact:AAPL in 18), product of:",
"details" : [ {
"value" : 0.25854474,
"description" : "queryWeight(exact:AAPL), product of:",
"details" : [ {
"value" : 6.1701355,
"description" : "idf(docFreq=211, maxDocs=37299)"
}, {
"value" : 0.0419026,
"description" : "queryNorm"
} ]
}, {
"value" : 3159.1094,
"description" : "fieldWeight(exact:AAPL in 18), product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(exact:AAPL)=1)"
}, {
"value" : 6.1701355,
"description" : "idf(docFreq=211, maxDocs=37299)"
}, {
"value" : 512.0,
"description" : "fieldNorm(field=exact, doc=18)"
} ]
} ]
} ]
}, {
"value" : 0.2,
"description" : "coord(1/5)"
} ]
}
}, {
"_shard" : 0,
"_node" : "1",
"_index" : "instruments",
"_type" : "instrument",
"_id" : "AAPL*CALL*20140118*500",
"_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"},
"_explanation" : {
"value" : 163.35423,
"description" : "product of:",
"details" : [ {
"value" : 816.7711,
"description" : "sum of:",
"details" : [ {
"value" : 816.7711,
"description" : "weight(exact:AAPL in 383), product of:",
"details" : [ {
"value" : 0.25854474,
"description" : "queryWeight(exact:AAPL), product of:",
"details" : [ {
"value" : 6.1701355,
"description" : "idf(docFreq=211, maxDocs=37299)"
}, {
"value" : 0.0419026,
"description" : "queryNorm"
} ]
}, {
"value" : 3159.1094,
"description" : "fieldWeight(exact:AAPL in 383), product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(exact:AAPL)=1)"
}, {
"value" : 6.1701355,
"description" : "idf(docFreq=211, maxDocs=37299)"
}, {
"value" : 512.0,
"description" : "fieldNorm(field=exact, doc=383)"
} ]
} ]
} ]
}, {
"value" : 0.2,
"description" : "coord(1/5)"
} ]
}
}, {
"_id" : "AAPL*PUT*20140118*940",
"_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 940,00"],"id-compound":"AAPL*PUT*20140118*940"},
"_explanation" : {
"value" : 163.35423,
"description" : "product of:",
"details" : [ {
"value" : 816.7711,
"description" : "sum of:",
"details" : [ {
"value" : 816.7711,
"description" : "weight(exact:AAPL in 794), product of:",
"details" : [ {
"value" : 0.25854474,
"description" : "queryWeight(exact:AAPL), product of:",
"details" : [ {
"value" : 6.1701355,
"description" : "idf(docFreq=211, maxDocs=37299)"
}, {
"value" : 0.0419026,
"description" : "queryNorm"
} ]
}, {
"value" : 3159.1094,
"description" : "fieldWeight(exact:AAPL in 794), product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(exact:AAPL)=1)"
}, {
"value" : 6.1701355,
"description" : "idf(docFreq=211, maxDocs=37299)"
}, {
"value" : 512.0,
"description" : "fieldNorm(field=exact, doc=794)"
} ]
} ]
} ]
}, {
"value" : 0.2,
"description" : "coord(1/5)"
} ]
}
}
如果我分析我想要存储的数据会发生什么:
curl -XGET 'localhost:9200/instruments/_analyze?field=exact&pretty=true' -d 'ING P JUN 2013 6.00'
{
"tokens" : [ {
"token" : "ING P JUN 2013 6.00",
"start_offset" : 0,
"end_offset" : 20,
"type" : "word",
"position" : 1
} ]
答案 0 :(得分:2)
我不确定它在技术上是否是最好的,但如果您只是在弹性搜索的单个特定答案之后,您可以使用带有查找完全匹配的脚本的过滤器。
{
from : 0,
size : 1,
"query" : {
"text_phrase" : {
"title" : "AAPL P JAN 2014 885,00"
}
},
"filter" : {
"script" : {
"script" : "_source.exact.contains(x)",
"params" : {
"x" : "AAPL P JAN 2014 885,00"
}
}
}
}
我用它来从弹性搜索中获取一个已知的条目,它对我来说效果很好。
答案 1 :(得分:1)
我认为你已经找到了答案,只是想为其他人提供更多信息以解决同样的问题。
您使用来自elasticsearch文档的field
查询:
字段查询:
针对特定字段执行查询字符串的查询。它是query_string查询的简化版本(通过将default_field设置为此查询执行的字段)。
我相信query_string
查询是针对文字的,即:它对查询做了很多工作,使其模糊等等......
您想要使用的内容(我认为您发现了这一点)是一个term
查询,它不会对搜索词组执行任何操作,因此只能为您提供完全匹配。
注意:分析发生在2个不同的时间,索引时间和查询时间。设置"analyzer": "keyword"
似乎只会影响“使用查询字符串”form elasticsearch docs进行搜索时的搜索时间查询。我必须承认我并不确切地知道这意味着什么(我猜是query_string
但它也可能意味着像http://../_search?q=exact:{query here}
这样的搜索
答案 2 :(得分:1)
答案 3 :(得分:0)
所有三个文档都得到完全相同的分数,你可以从他们在“AAPL”上匹配的解释输出中看到。该术语始终在文档中出现一次(tf = 1),并且出现在37299个文档中的211个(idf = 6.1701355)。因为你使用索引时间提升(你的映射中的提升部分,10),所以字段规范要高得多,因为匹配总是在同一个字段上,所以没什么大不了的。只是如果你在其他领域有匹配,那么几乎总能赢,这在你的情况下可能有意义。
但问题是,如果我查看你的文件,AAPL P JAN 2014 885,00
并不完全匹配。我所看到的是,在你的查询中的5个术语中只有一个匹配,这在你的解释输出中也被coord确认:coord(1/5)`。
似乎应用了keyword
分析器,但正如您从返回的文档中看到的那样,您不是将exact
字段的内容作为单个值发送,而是作为值数组发送。由于您使用的是keyword
分析器,因此每个项目都不会被标记化,但您仍然有多个令牌。我想你必须检查你是如何索引文件的。
答案 4 :(得分:0)
在搜索查询中似乎忽略了关键字分析器的原因是因为ES将此字符串标记两次 - 首先运行其DSL标记化器然后它运行rezult上maping中指定的标记生成器。本文http://paulsabou.com/blog/2012/03/25/advanced-exact-matching-with-elastic-search/
中对此进行了更详细的说明