我有关于ElasticSearch的问题以及更多类似此查询的问题。
有映射:
{
"directory.v1": {
"mappings": {
"profile.event": {
"properties": {
"event": {
"properties": {
"naics": {
"type": "nested",
"properties": {
"type": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
},
"user_id": {
"type": "long"
}
}
}
}
}
}
和文档(A)作为源和文档(B),更像这个查询(对于A)
个人资料A(用作来源):
{
"_index": "directory.v1",
"_type": "profile.event",
"_id": "83731111.559",
"_score": 1,
"_source": {
"user_id": 8373,
"event": {
"naics": [
{
"value": 331,
"type": "naics"
},
{
"value": 74,
"type": "naics"
},
{
"value": 938,
"type": "naics"
},
{
"value": 2048,
"type": "naics"
},
{
"value": 939,
"type": "naics"
},
{
"value": 2049,
"type": "naics"
},
{
"value": 940,
"type": "naics"
},
{
"value": 2050,
"type": "naics"
},
{
"value": 941,
"type": "naics"
},
{
"value": 2051,
"type": "naics"
},
{
"value": 942,
"type": "naics"
},
{
"value": 2052,
"type": "naics"
},
{
"value": 943,
"type": "naics"
},
{
"value": 2053,
"type": "naics"
},
{
"value": 944,
"type": "naics"
},
{
"value": 2054,
"type": "naics"
},
{
"value": 945,
"type": "naics"
},
{
"value": 2055,
"type": "naics"
},
{
"value": 473,
"type": "naics"
},
{
"value": 128,
"type": "naics"
},
{
"value": 10,
"type": "naics"
},
{
"value": 1242,
"type": "naics"
},
{
"value": 472,
"type": "naics"
},
{
"value": 1241,
"type": "naics"
}
]
}
}
}
简介B:
{
"_index": "directory.v1",
"_type": "profile.event",
"_id": "46124111.559",
"_score": 1,
"_source": {
"user_id": 46124,
"event": {
"naics": [
{
"value": 331,
"type": "naics"
},
{
"value": 74,
"type": "naics"
},
{
"value": 938,
"type": "naics"
},
{
"value": 2048,
"type": "naics"
},
{
"value": 939,
"type": "naics"
},
{
"value": 2049,
"type": "naics"
},
{
"value": 940,
"type": "naics"
},
{
"value": 2050,
"type": "naics"
},
{
"value": 941,
"type": "naics"
},
{
"value": 2051,
"type": "naics"
},
{
"value": 942,
"type": "naics"
},
{
"value": 2052,
"type": "naics"
},
{
"value": 943,
"type": "naics"
},
{
"value": 2053,
"type": "naics"
},
{
"value": 944,
"type": "naics"
},
{
"value": 2054,
"type": "naics"
},
{
"value": 945,
"type": "naics"
},
{
"value": 2055,
"type": "naics"
}
]
}
}
}
其中B doc包含A文档中包含的所有元素(naics)。
所以我真的不明白为什么查询:
{
"query": {
"nested": {
"path": "event.naics",
"query": {
"more_like_this": {
"like": [
{
"_id": "83731111.559",
"_type": "profile.event"
}
],
"fields": [
"event.naics.value"
],
"min_term_freq": 1,
"min_doc_freq": 1,
"minimum_should_match": "8%"
}
}
}
}
}
我有结果!!
但是当我增加min_should_match> = 9%时,它根本不匹配,我得不到任何结果。
还试图做这样的事情,这让我得到了一些高达11%的结果
{
"query": {
"nested": {
"path": "event.naics",
"query": {
"more_like_this": {
"like": [
{
"_id": "83731111.559",
"_type": "profile.event"
}
],
"fields": [
"event.naics.*"
],
"min_term_freq": 1,
"min_doc_freq": 1,
"minimum_should_match": "11%"
}
}
}
}
}
源文档的termvecor是:
{
"_index": "directory.v1",
"_type": "profile.event",
"_id": "83731111.559",
"_version": 5,
"found": true,
"took": 0,
"term_vectors": {}
}
答案 0 :(得分:1)
如果你得到文件" A"对于field event.naics.value,您将看到总共有24个术语,每个术语的术语频率为1。 因此,当你进行8%匹配时,将向下舍入到24个生成的should子句中的1个子句,这样你就得到一个匹配。但是24个中有9%将循环到2个子句应该匹配,因为每个嵌套文档只有一个值,所以没有bueno。
术语向量
POST /directory.v1/profile.event/83731111.559/_termvectors
{
"fields":["event.naics.value"],
"offsets" : false,
"payloads" : false,
"positions" : false,
"term_statistics" : true,
"field_statistics" : true
}