我需要使用用户和文章对网站进行建模,其中每个用户可以与任何文章多次交互(阅读,打开e.t.c)。我想通过遵循嵌套映射在一个elasticsearch索引中对此数据建模:
{
"mappings": {
"user": {
"properties": {
"user_id": {"type": "string"},
"interactions": {
"type": "nested",
"properties": {
"article_id": {"type": "string"},
"interact_date": {"type": "date"}
}
}
}
}
}
}
索引文档的示例:
{
"user_id": 20,
"interactions": [
{"article_id": "111", "interact_date": "2015-01-01"},
{"article_id": "111", "interact_date": "2015-01-02"},
{"article_id": "222", "interact_date": "2015-01-01"}
]
}
我需要对数据进行以下聚合:
每天通过嵌套聚合完成的互动次数:
GET /_search
{
"size": 0,
"aggs": {
"by_date": {
"nested": {
"path": "interactions"
},
"aggs": {
"m_date": {"terms": {"field": "interactions.interact_date"}}
}
}
}
}
每日唯一身份用户互动次数。如果特定用户在相同日期范围内与多篇文章交互,则用户应仅计算一次。 在postgres中它是简单的查询: 对于包含3列的表[user_id,article_id,interact_date]
SELECT dt, count(uid)
FROM (SELECT interact_date::TIMESTAMP::DATE dt, user_id uid FROM interactions
GROUP BY interact_date::TIMESTAMP::DATE, user_id) by_date
GROUP BY dt;
我如何在弹性搜索索引中做同样的事情?
如何在不重新索引整个文档的情况下通过_update添加交互?
谢谢
答案 0 :(得分:1)
每天唯一身份用户互动次数。
{
"size": 0,
"aggs": {
"nested_agg": {
"nested": {
"path": "interactions"
},
"aggs": {
"per_day": {
"date_histogram": {
"field": "interactions.interact_date",
"interval": "day",
"min_doc_count": 1
},
"aggs": {
"users_count": {
"reverse_nested": {},
"aggs": {
"uniques": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}
}
}
}
}
如何通过_update添加交互而无需重新索引整个文档?
那是不可能的。这是definition of a nested object:要更新,添加或删除嵌套对象,我们必须重新索引整个文档。
如何按特定文章过滤用户 - 只有当用户与其中一篇指定文章进行互动时,才会按用户对用户进行一次计算?
{
"size": 0,
"query": {
"nested": {
"path": "interactions",
"query": {
"term": {
"interactions.article_id": {
"value": "222"
}
}
}
}
},
"aggs": {
"nested_agg": {
"nested": {
"path": "interactions"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"interactions.article_id": {
"value": "222"
}
}
},
"aggs": {
"per_day": {
"date_histogram": {
"field": "interactions.interact_date",
"interval": "day",
"min_doc_count": 1
},
"aggs": {
"users_count": {
"reverse_nested": {},
"aggs": {
"uniques": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}
}
}
}
}
}
}