我经常发现自己使用以下 dplyr 语法来计算数据帧的摘要统计信息:
1. Aggregate <-
2. Original Dataset %>%
3. Group_By %>%
4. Filter %>%
5. Summarize %>%
6. Left_Join(back to Aggregate)
例如:
Original <- data.frame(A = 1:100,B = sample(LETTERS,100,replace = TRUE),C = rnorm(100))
# Calculate 1st Summary Statistic
Aggregate <- Original %>% group_by(B) %>%
filter(A > 50) %>%
summarize(meanC = mean(C))
# Calculate 2nd Summary Statistic
Aggregate <- Original %>% group_by(B) %>%
summarize(Q = sum(C)) %>%
left_join(x = Aggregate,y = Original,by = "B")
我的问题有两个:
A)是否有更好的方法从另一个表构建汇总统计信息表?左联接感觉很笨拙。
B)执行此操作的“ data.table”方法是什么,即如何联接回Aggregate表?
Aggregate[Aggregate[,meanC:=mean(C),by=.(B)]]
感谢您的任何建议...
答案 0 :(得分:0)
如果在group_by之后进行突变而不是汇总,则可以避免加入。 (注意:我不知道如何以这种方式进行过滤后的摘要统计信息。您可能希望稍后再进行分组,以避免以后出现意外行为。)
+--------------+-------+---------+---------+---------+
| (mm) width | 10~20 | 20.1~30 | 30.1~40 | 40.1~50 |
+--------------+-------+---------+---------+---------+
| 0.20~0.45 | 1.3 | 1.8 | 2.1 | 3.5 |
| 0.46~0.60 | 1.4 | 1.6 | 1.8 | 2.3 |
| 0.61~0.70 | 1.5 | 1.7 | 1.6 | 2.1 |
| 0.71~0.80 | 0.7 | 1.1 | 2.2 | 3.1 |
+--------------+-------+---------+---------+---------+