考虑以下dplyr查询
# A tibble: 7 x 2
class n()
<chr> <int>
1 2seater 5
2 compact 47
3 midsize 41
4 minivan 11
5 pickup 33
6 subcompact 35
7 suv 62
输出
> mpg %>% group_by(class) %>% filter(hwy==21) %>% summarise(n())
现在,我想按如下方式过滤结果:
# A tibble: 2 x 2
class n()
<chr> <int>
1 minivan 1
2 subcompact 1
也就是说,我想显示高速公路里程数为21的汽车类别。结果如下:
# A tibble: 7 x 2
class n()
<chr> <int>
1 2seater 0
2 compact 0
3 midsize 0
4 minivan 1
5 pickup 0
6 subcompact 1
7 suv 0
这是预期的结果,但我想要看到的是所有类别,如果一个班级没有高速公路里程为21的汽车,那么n()应报告为0。我可以这样做吗?
换句话说,我想要显示以下输出的dplyr查询:
DELETE test
PUT test
{
"mappings": {
"trade": {
"properties": {
"trade_id": {
"type": "string",
"index": "not_analyzed"
},
"product_id": {
"type": "string",
"index": "not_analyzed"
},
"quantity": {
"type": "double"
},
"execution_time": {
"type": "date"
},
"price_per_unit": {
"type": "double"
}
}
}
}
}
POST test/trade/_bulk
{"index":{}}
{"execution_time":"2016-11-18T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-18T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-19T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-20T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-20T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-20T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-21T22:45:27Z","quantity":10,"price_per_unit":5}
{"index":{}}
{"execution_time":"2016-11-21T22:45:27Z","quantity":10,"price_per_unit":5}
POST test/trade/_search
{
"size": 0,
"aggs": {
"sales_per_day": {
"date_histogram": {
"field": "execution_time",
"interval": "day"
},
"aggs": {
"sales": {
"sum": {
"script": {
"lang": "groovy",
"inline": "doc['quantity'] * doc['price_per_unit']"
}
}
},
"cumulative_sales": {
"cumulative_sum": {
"buckets_path": "sales"
}
}
}
}
}
}
其中n()是公路里程为21的汽车类别。
这可能吗?
答案 0 :(得分:0)
试试这个
mpg %>% mutate(k=(hwy==21)) %>% group_by(class) %>%
summarise(n=sum(k))