这是我的 Mapping
:
{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
},
"mappings" :{
"cpt_logs_mapping" : {
"properties" : {
"channel_id" : {"type":"integer","store":"yes","index":"not_analyzed"},
"playing_date" : {"type":"string","store":"yes","index":"not_analyzed"},
"country_code" : {"type":"text","store":"yes","index":"analyzed"},
"playtime_in_sec" : {"type":"integer","store":"yes","index":"not_analyzed"},
"channel_name" : {"type":"text","store":"yes","index":"analyzed"},
"device_report_tag" : {"type":"text","store":"yes","index":"analyzed"}
}
}
}
}
我想使用以下 MySQL
查询查询类似于我的方式的索引:
SELECT
channel_name,
SUM(`playtime_in_sec`) as playtime_in_sec
FROM
channel_play_times_bar_chart
WHERE
country_code = 'country' AND
device_report_tag = 'device' AND
channel_name = 'channel'
playing_date BETWEEN 'date_range_start' AND 'date_range_end'
GROUP BY channel_id
ORDER BY SUM(`playtime_in_sec`) DESC
LIMIT 30;
到目前为止,我的 QueryDSL
看起来像这样
{
"size": 0,
"aggs": {
"ch_agg": {
"terms": {
"field": "channel_id",
"size": 30 ,
"order": {
"sum_agg": "desc"
}
},
"aggs": {
"sum_agg": {
"sum": {
"field": "playtime_in_sec"
}
}
}
}
}
}
问题1
虽然我所做的 QueryDSL
确实给我带来了前30个channel_ids w.r.t播放时间,但我很困惑如何在搜索范围内添加其他过滤器,即country_code,device_report_tag& playing_date。
问题2
另一个问题是,结果集仅包含 channel_id
和播放时间字段,而不像 MySQL
结果集,它会返回channel_name和playtime_in_sec列。这意味着我想使用channel_id字段实现聚合,但结果集应该返回该组的相应channel_name名称。
NOTE
:此处的效果是首要任务,因为它应该在查询数百万甚至更多文档的图形生成器后面运行。
测试数据
hits: [
{
_index: "cpt_logs_index",
_type: "cpt_logs_mapping",
_id: "",
_score: 1,
_source: {
ChID: 1453,
playtime_in_sec: 35,
device_report_tag: "mydev",
channel_report_tag: "Sony Six",
country_code: "SE",
@timestamp: "2017-08-11",
}
},
{
_index: "cpt_logs_index",
_type: "cpt_logs_mapping",
_id: "",
_score: 1,
_source: {
ChID: 145,
playtime_in_sec: 25,
device_report_tag: "mydev",
channel_report_tag: "Star Movies",
country_code: "US",
@timestamp: "2017-08-11",
}
},
{
_index: "cpt_logs_index",
_type: "cpt_logs_mapping",
_id: "",
_score: 1,
_source: {
ChID: 12,
playtime_in_sec: 15,
device_report_tag: "mydev",
channel_report_tag: "HBO",
country_code: "PK",
@timestamp: "2017-08-12",
}
}
]
答案 0 :(得分:0)
问题1:
您是否要在上面的示例中添加过滤器/查询?如果是这样,您只需添加一个"查询"节点到查询文档:
{
"size": 0,
"query":{
"bool":{
"must":[
{"terms": { "country_code": ["pk","us","se"] } },
{"range": { "@timestamp": { "gt": "2017-01-01", "lte": "2017-08-11" } } }
]
}
},
"aggs": {
"ch_agg": {
"terms": {
"field": "ChID",
"size": 30
},
"aggs":{
"ch_report_tag_agg": {
"terms":{
"field" :"channel_report_tag.keyword"
},
"aggs":{
"sum_agg":{
"sum":{
"field":"playtime_in_sec"
}
}
}
}
}
}
}
}
在开始聚合之前,您可以使用弹性的所有常规查询/过滤器预先过滤搜索(关于性能,elasticsearch将在开始聚合之前应用任何过滤器/查询,因此您可以在此处执行的任何过滤都会有很大帮助)
问题2:
在我的头脑中,我建议使用两种解决方案中的一种(除非我并没有完全误解这个问题):
按照要向下钻取的顺序为输出中的所需字段添加aggs级别。 (您可以非常深入地在aggs中嵌入aggs而不会出现问题,并获得每个级别的计数奖励)
在"最低"上使用top_hits聚合aggs的级别,并使用" _source":{" include":[/ fields /]}
指定输出中您想要的字段您能提供一些测试数据记录吗?
此外,了解您正在运行的ElasticSearch版本非常有用,因为主要版本之间的语法和行为会发生很大变化。