我们说我有以下数据框:
x <-c(rep (c ("s1", "s2", "s3"),each=5 ))
y <- c(rep(c("a", "b", "c", "d", "e"), 3) )
z<-c(1:15)
x_name <- "dimensions"
y_name <- "aspects"
z_name<-"value"
df <- data.frame(x,y,z)
names(df) <- c(x_name,y_name, z_name)
如何折叠/加入因子级别&#39; a&#39;,&#39; c&#39;&#39; d&#39;在一个新的因素&#39; x&#39;跨越&#39;维度&#39;和&#39;值&#39;,以便将该值与新的x因子级别相加。输出应如下所示:
我想用gsub用x替换a,c,d的名字,然后使用aggregate对它们的值求和。但是有更简单的方法吗?此外,如果我有其他列包含a,c,d,我不确定我的解决方案是否仍然有效 我在论坛上回顾了几个相关的答案,但都没有解决这个问题。感谢。
答案 0 :(得分:3)
首先将a,c和d重命名为x,然后按尺寸和方面求和
阅读数据:
df <- data.frame(dimensions = x, aspects = y, value = z, stringsAsFactors = FALSE)
Base R解决方案:
# if you read the data my way the following line is unnecessary
# df$aspects <- as.character(df$aspects)
df[df$aspects %in% c("a","c","d"),]$aspects <- "x"
aggregate(value ~., df, sum)
结果:
dimensions aspects value
1 s1 b 2
2 s2 b 7
3 s3 b 12
4 s1 e 5
5 s2 e 10
6 s3 e 15
7 s1 x 8
8 s2 x 23
9 s3 x 38
data.table
解决方案
require(data.table)
DT <- setDT(df)
DT[aspects %in% c("a","c","d"), aspects := "x"]
DT[,sum(value), by=.(dimensions, aspects)]
结果
dimensions aspects V1
1: s1 x 8
2: s1 b 2
3: s1 e 5
4: s2 x 23
5: s2 b 7
6: s2 e 10
7: s3 x 38
8: s3 b 12
9: s3 e 15
答案 1 :(得分:2)
以下是使用plyr::revalue
的解决方案(另请参阅plyr::mapvalues
)和dplyr
:
# install.packages("plyr")
library(dplyr)
df %>%
mutate(aspects = plyr::revalue(aspects, c("a" = "x", "c" = "x", "d" = "x"))) %>%
group_by(dimensions, aspects) %>%
summarise(sum_value = sum(value))
# dimensions aspects sum_value
# (fctr) (fctr) (int)
# 1 s1 x 8
# 2 s1 b 2
# 3 s1 e 5
# 4 s2 x 23
# 5 s2 b 7
# 6 s2 e 10
# 7 s3 x 38
# 8 s3 b 12
# 9 s3 e 15