我有这样的数据集
V0 V1 V2 V3 X Y
#1 1 A 21 31 123 12
#2 2 A 21 31 245 24
#3 3 B 22 32 234 25
#4 4 C 23 33 190 30
#5 5 C 23 33 210 20
因此V1,V2和V3中存在重复值;我想创建一个这样的数据集,它分别用V1-V3
总结X和Y. V1 V2 V3 X Y V
#1 A 21 31 368 36 1,2
#2 B 22 32 234 25 3
#3 C 23 33 400 50 4,5
我正在尝试“聚合”如下,但不知道如何同时使用X和Y,同时将V2和V3保持为原始值,而不是将它们相加。如何将V0中的值放在同一组(V1)中的另一个新变量中。
df.sum <- aggregate(X~V1,data=df,FUN=sum)
我尝试将“df.sum”和“df”合并为“V1”,但事实证明所有重复的值也合并了。
有什么建议吗?非常感谢你!
答案 0 :(得分:3)
你走在正确的轨道上。只是做:
aggregate(. ~ V1 + V2 + V2, mydf, sum)
# V1 V2 V3 X Y
# 1 A 21 62 368 36
# 2 B 22 32 234 25
# 3 C 23 66 400 50
您还可以通过许多其他方式执行此操作。例如,这是一种使用&#34; data.table&#34;:
的方法library(data.table)
as.data.table(mydf)[, lapply(.SD, sum), by = list(V1, V2, V3)]
答案 1 :(得分:3)
或dplyr
library(dplyr)
df %>% group_by(V1,V2,V3) %>% summarise(X_sum=sum(X), Y_sum= sum(Y))
# Or as suggested, you could also do:
df %>% group_by(V1,V2,V3) %>% summarise_each(funs(sum))
#Source: local data frame [3 x 5]
#Groups: V1, V2
#
# V1 V2 V3 X_sum Y_sum
#1 A 21 31 368 36
#2 B 22 32 234 25
#3 C 23 33 400 50
# data
df <- structure(list(V1 = structure(c(1L, 1L, 2L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), V2 = c(21L, 21L, 22L, 23L, 23L),
V3 = c(31L, 31L, 32L, 33L, 33L), X = c(123L, 245L, 234L,
190L, 210L), Y = c(12L, 24L, 25L, 30L, 20L)), .Names = c("V1",
"V2", "V3", "X", "Y"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
关于更新的数据,您可以执行以下操作:
df %>% group_by(V1,V2,V3) %>%
summarise_each(funs(sum, toString), X, Y, V0) %>%
select(-V0_sum,
-X_toString,
-Y_toString)
# you get
# V1 V2 V3 X_sum Y_sum V0_toString
# 1 A 21 31 368 36 1, 2
# 2 B 22 32 234 25 3
# 3 C 23 33 400 50 4, 5
# data
df <- structure(list(V0 = 1:5, V1 = structure(c(1L, 1L, 2L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), V2 = c(21L, 21L, 22L, 23L, 23L),
V3 = c(31L, 31L, 32L, 33L, 33L), X = c(123L, 245L, 234L,
190L, 210L), Y = c(12L, 24L, 25L, 30L, 20L)), .Names = c("V0",
"V1", "V2", "V3", "X", "Y"), class = "data.frame", row.names = c(NA,
-5L))