我有一个带有两个Id变量和一个名称变量的数据框。 这些变量的组合数量不等。
## dput'ed data.frame
df <- structure(list(V1 = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("A",
"B", "C", "D", "E"), class = "factor"), V2 = c(1L, 2L, 3L, 1L,
2L, 3L, 2L, 2L, 1L, 3L, 1L, 2L, 1L, 3L, 2L, 1L, 1L, 3L, 1L, 1L
), V3 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 2L, 2L, 1L, 3L, 1L,
2L, 1L, 3L, 2L, 1L, 1L, 3L, 1L, 1L), .Label = c("test1", "test2",
"test3"), class = "factor")), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-20L))
>df
V1 V2 V3
1 A 1 test1
2 B 2 test2
3 C 3 test3
4 D 1 test1
5 E 2 test2
6 A 3 test3
7 B 2 test2
8 C 2 test2
9 D 1 test1
10 E 3 test3
11 A 1 test1
12 B 2 test2
13 C 1 test1
14 D 3 test3
15 E 2 test2
16 A 1 test1
17 B 1 test1
18 C 3 test3
19 D 1 test1
20 E 1 test1
我想组合行,以便结果每个V1只有一个条目,然后以逗号分隔的值列表作为第二个和第三个变量。像这样:
f V2 V3
1 A 1 ,3 ,1 ,1 test1 ,test3 ,test1 ,test1
2 B 2 ,2 ,2 ,1 test2 ,test2 ,test2 ,test1
3 C 3 ,2 ,1 ,3 test3 ,test2 ,test1 ,test3
4 D 1 ,1 ,3 ,1 test1 ,test1 ,test3 ,test1
5 E 2 ,3 ,2 ,1 test2 ,test3 ,test2 ,test1
我用以下代码尝试了这个,如果有点慢,这很好。有关更快解决方案的任何建议吗?
df = lapply(levels(df$V1), function(f){
cbind(f,
paste(df$V2[df$V1==f],collapse=" ,"),
paste(df$V3[df$V1==f],collapse=" ,"))
})
df = as.data.frame(do.call(rbind, df))
df
编辑:更正的输入(df)
答案 0 :(得分:3)
确保V3
(或其他因素变量)处于模式as.character
并使用aggregate
:
df$V3 = as.character(df$V3)
aggregate(df[-1], by=list(df$V1), c, simplify=FALSE)
# Group.1 V2 V3
# 1 A 1, 3, 1, 1 test1, test3, test1, test1
# 2 B 2, 2, 2, 1 test2, test2, test2, test1
# 3 C 3, 2, 1, 3 test3, test2, test1, test3
# 4 D 1, 1, 3, 1 test1, test1, test3, test1
# 5 E 2, 3, 2, 1 test2, test3, test2, test1
答案 1 :(得分:0)
do.call("rbind", lapply(split(df[, 2:3], df[,1]), function(x) sapply(x, paste, collapse=",")))
V2 V3
A "1,3,1,1" "test1,test3,test1,test1"
B "2,2,2,1" "test2,test2,test2,test1"
C "3,2,1,3" "test3,test2,test1,test3"
D "1,1,3,1" "test1,test1,test3,test1"
E "2,3,2,1" "test2,test3,test2,test1"