我有一个数据框:
metric1 metric2 metric3 field1 field2
1 1.07809668 4.2569882 7.1710095 L S1
2 0.56174763 1.2660273 -0.3751915 L S2
3 1.17447327 5.5186679 11.6868322 L S2
4 0.32830724 -0.8374830 1.8973718 S S2
5 -0.51213503 -0.3076640 10.0730274 S S1
6 0.24133119 2.7984703 15.9622215 S S1
7 1.96664414 0.1818531 2.7416768 S S3
8 0.06669409 3.8652075 10.5066330 S S3
9 1.14660437 8.5703119 3.4294062 L S4
10 -0.72785683 9.3320762 1.3827989 L S4
我正在显示2个字段,但还有几个字段。我需要总结按每个字段分组的指标,例如for field1:
DF %>% group_by(field1) %>% summarise_each(funs(sum),metric1,metric2,metric3)
我可以为列的总和(metric1),sum(metric2),sum(metric3)的每个字段执行此操作,但我需要的表输出是这样的:
L(field1) S(field1) S1(field2) S2(field2) S3(field2) S4(field2)
sum(metric1)
sum(metric2)
sum(metric3)
我相信必须有一种方法可以使用tidyr和dplyr来做到这一点,但无法弄明白
答案 0 :(得分:6)
从recast
包
reshape2
library(reshape2)
recast(DF, variable ~ field1 + field2, sum)
# variable L_S1 L_S2 L_S4 S_S1 S_S2 S_S3
# 1 metric1 1.078097 1.736221 0.4187475 -0.2708038 0.3283072 2.033338
# 2 metric2 4.256988 6.784695 17.9023881 2.4908063 -0.8374830 4.047061
# 3 metric3 7.171010 11.311641 4.8122051 26.0352489 1.8973718 13.248310
与
相同dcast(melt(DF, c("field1", "field2")), variable ~ field1 + field2, sum)
如果需要,您也可以将其与tidyr::gather
结合使用,但是您无法使用tidyr::spread
,因为它没有fun.aggregate
参数
DF %>%
gather(variable, value, -(field1:field2)) %>%
dcast(variable ~ field1 + field2, sum)
# variable L_S1 L_S2 L_S4 S_S1 S_S2 S_S3
# 1 metric1 1.078097 1.736221 0.4187475 -0.2708038 0.3283072 2.033338
# 2 metric2 4.256988 6.784695 17.9023881 2.4908063 -0.8374830 4.047061
# 3 metric3 7.171010 11.311641 4.8122051 26.0352489 1.8973718 13.248310
答案 1 :(得分:2)
对于所有dplyr
和tidyr
解决方案,您可以执行以下操作:
library(dplyr)
library(tidyr)
df %>%
unite(variable, field1, field2) %>%
group_by(variable) %>%
summarise_each(funs(sum)) %>%
gather(metrics, value, -variable) %>%
spread(variable, value)
给出了:
#Source: local data frame [3 x 7]
#
# metrics L_S1 L_S2 L_S4 S_S1 S_S2 S_S3
#1 metric1 1.078097 1.736221 0.4187475 -0.2708038 0.3283072 2.033338
#2 metric2 4.256988 6.784695 17.9023881 2.4908063 -0.8374830 4.047061
#3 metric3 7.171010 11.311641 4.8122051 26.0352489 1.8973718 13.248310
修改强>
在阅读了对David的答案的评论后,我认为这更接近您的预期输出:
field1 <- group_by(df, field = field1) %>% summarise_each(funs(sum), -(field1:field2))
field2 <- group_by(df, field = field2) %>% summarise_each(funs(sum), -(field1:field2))
bind_rows(field1, field2) %>%
gather(metrics, value, -field) %>%
spread(field, value)
给出了:
#Source: local data frame [3 x 7]
#
# metrics L S S1 S2 S3 S4
#1 metric1 3.233065 2.090842 0.8072928 2.064528 2.033338 0.4187475
#2 metric2 28.944071 5.700384 6.7477945 5.947212 4.047061 17.9023881
#3 metric3 23.294855 41.180931 33.2062584 13.209013 13.248310 4.8122051