换句话说,如何汇总一列(例如column
),同时保持另一列(例如location
)?
这个MWE说明了我的问题。执行location
后,如何在summarise()
列中添加回来?在summarise()
之前是否有一些涉及“上升级别”的解决方案,以便我可以维护原始列?
test <- as.data.table(data.frame(event_id = c("A","B","A","A","B"),
income = c(1,2,3,4,5),
location = c("PlaceX","PlaceY","PlaceX","PlaceX","PlaceY")))
test
event_id income location
1: A 1 PlaceX
2: B 2 PlaceY
3: A 3 PlaceX
4: A 4 PlaceX
5: B 5 PlaceY
test %>%
group_by(event_id) %>%
summarise(mean_inc = mean(income))
Source: local data table [2 x 2]
event_id mean_inc
(fctr) (dbl)
1 A 2.666667
2 B 3.500000
以下不起作用:
test %>%
group_by(event_id) %>%
summarise(mean_inc = mean(income),
location = location)
Source: local data table [5 x 3]
event_id mean_inc location
(fctr) (dbl) (fctr)
1 A 2.666667 PlaceX
2 A 2.666667 PlaceX
3 A 2.666667 PlaceX
4 B 3.500000 PlaceY
5 B 3.500000 PlaceY
我想要的输出是:
Source: local data table [2 x 3]
event_id location mean_inc
(fctr) (fctr) (dbl)
1 A PlaceX 2.666667
2 B PlaceY 3.500000
答案 0 :(得分:1)
我希望我理解你的欲望。执行inner_join
以恢复缺失的列(假设它们与group_by
参数匹配为1-1):
newtest <- test %>%
group_by(event_id) %>%
summarise(mean_inc = mean(income)) %>% inner_join(test[-(1:2)])
#Joining by: "event_id"
newtest
#-----------------
Source: local data table [3 x 4]
event_id mean_inc income location
(fctr) (dbl) (dbl) (fctr)
1 A 2.666667 3 PlaceX
2 A 2.666667 4 PlaceX
3 B 3.500000 5 PlaceY
您也希望在event_id和location上匹配:
test %>%
group_by(event_id,location) %>%
summarise(mean_inc = mean(income))
#---------
#Source: local data table [2 x 3]
#Groups: event_id
event_id location mean_inc
(fctr) (fctr) (dbl)
1 A PlaceX 2.666667
2 B PlaceY 3.500000
答案 1 :(得分:0)
选项可能是使用mutate
,然后通过distinct
为每个组提取一个值。
这个用途取决于实际用例:如果你的新变量与它总结的原始变量同名,这似乎最有用。否则,您最终会在最终数据集中获得原始的,未经过更新的变量。
distinct
在这里工作,因为该对象仍然是分组的。
test %>%
group_by(event_id) %>%
mutate(income = mean(income)) %>%
distinct()
Source: local data table [2 x 3]
event_id income location
(fctr) (dbl) (fctr)
1 A 2.666667 PlaceX
2 B 3.500000 PlaceY
在 dplyr_0.4.3.9000 中,您需要.keep_all = TRUE
中的distinct
。