是否可以对嵌套数据类型(https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html)内的文本执行更像这样的查询(https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html)?
我想查询的文档(由于数据归另一方所有,我无法控制它的格式)看起来像这样:
{
"communicationType": "Email",
"timestamp": 1497633308917,
"textFields": [
{
"field": "Subject",
"text": "This is the subject of the email"
},
{
"field": "To",
"text": "to-email@domain.com"
},
{
"field": "Body",
"text": "This is the body of the email"
}
]
}
我想在电子邮件正文中执行更多这样的查询。以前,文件看起来像这样:
{
"communicationType": "Email",
"timestamp": 1497633308917,
"textFields": {
"subject": "This is the subject of the email",
"to: "to-email@domain.com",
"body": "This is the body of the email"
}
}
我能够像这样在电子邮件正文中执行更像这样的查询:
{
"query": {
"more_like_this": {
"fields": ["textFields.body"],
"like": "This is a similar body of an email",
"min_term_freq": 1
},
"bool": {
"filter": [
{ "term": { "communicationType": "Email" } },
{ "range": { "timestamp": { "gte": 1497633300000 } } }
]
}
}
}
但是现在该数据源已被弃用,我需要能够对具有嵌套数据类型的电子邮件正文的新数据源执行等效查询。我只想将文本与“标题”为“正文”的“文本”字段进行比较。
这可能吗?如果是这样,查询将如何?与非嵌套文档之前的嵌套数据类型相比,是否会对嵌套数据类型执行查询?即使应用了timestamp和communicationType过滤器,每个查询仍然需要数千万个文档来比较相似的文本,因此性能很重要。
答案 0 :(得分:0)
实际上,在嵌套查询中使用更像这样的查询是直截了当的:
{
"query": {
"bool": {
"must": {
"nested": {
"path": "textFields",
"query": {
"bool": {
"must": {
"more_like_this": {
"fields": ["textFields.text"],
"like_text": "This is a similar body of an email",
"min_term_freq": 1
}
},
"filter": {
"term": { "textFields.field": "Body" }
}
}
}
}
},
"filter": [
{
"term": {
"communicationType": "Email"
}
},
{
"range": {
"timestamp": {
"gte": 1497633300000
}
}
}
]
}
},
"min_score": 2
}