这里的ES新手。我试图在1个索引中使用以下模式从源实现搜索引擎:
index:paper
{
"title": string,
"author": string,
"id": string,
"references": [string:another_paper.id, string:another_paper.id, ...],
"pubDate": date
}
让我们说我想和作者一起搜索所有论文" A.史密斯" 2017-01-09至2017-01-30之间。
我如何制作我的搜索查询以获得带有生成字段的结果,该字段说明"引用"下的其他文档引用每个文档的次数。领域?这在ES中甚至可能吗?
执行速度并不重要,我可以忍受相对较慢的执行速度,但我不希望在上传新文档时更新现有文档。
谢谢
答案 0 :(得分:0)
您绝对可以根据作者姓名和日期范围获得结果。 使用此查询,您可以获得与查询匹配的文档引用的文档数以及文档的计数。
简而言之,您可以根据其他文档获取参考文档的数量
例如,假设您索引3个文档
{
"title": "title1",
"author": "bob",
"id": "id1",
"references": [
"id1",
"id2",
"id3"
],
"pubDate": "01-01-2018"
},
{
"title": "title2",
"author": "harry",
"id": "id2",
"references": [
"id1",
"id3",
"id7",
"id8"
],
"pubDate": "01-02-2018"
},
{
"title": "title3",
"author": "bob",
"id": "id3",
"references": [
"id1",
"id4",
"id7",
"id9"
],
"pubDate": "01-03-2018"
}
在此之后,您可以触发查询
GET test_stackoverflow_agg/type1/_search
{
"query": {
"query_string": {
"query": "author:bob AND pubDate:[2018-01-02 TO 2018-01-04]"
}
},
"aggs": {
"agg1": {
"terms": {
"field": "references",
"size": 10
}
}
}
}
查询部分将告诉您要过滤哪些文档和
聚合部分将告诉您要在哪个字段中获取参考字段中存在的唯一ID的数量
以下是
的结果{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.0460204,
"hits": [
{
"_index": "test_stackoverflow_agg",
"_type": "type1",
"_id": "id3",
"_score": 1.0460204,
"_source": {
"title": "title3",
"author": "bob",
"id": "id3",
"references": [
"id1",
"id4",
"id7",
"id9"
],
"pubDate": "2018-01-03"
}
},
{
"_index": "test_stackoverflow_agg",
"_type": "type1",
"_id": "id1",
"_score": 1.0460204,
"_source": {
"title": "title1",
"author": "bob",
"id": "id1",
"references": [
"id1",
"id2",
"id3"
],
"pubDate": "2018-01-02"
}
}
]
},
"aggregations": {
"agg1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "id1",
"doc_count": 2
},
{
"key": "id2",
"doc_count": 1
},
{
"key": "id3",
"doc_count": 1
},
{
"key": "id4",
"doc_count": 1
},
{
"key": "id7",
"doc_count": 1
},
{
"key": "id9",
"doc_count": 1
}
]
}
}
}