Question

我尽可能地搜索，我的部分问题是我真的不确定要问什么。这是我的数据，以及我希望它如何结束：

现在：

john    a Yes
john    b No
john    c No
Rebekah a Yes
Rebekah d No
Chase   c Yes
Chase   d No
Chase   e No
Chase   f No

我希望如何：

john     a,b,c    Yes
Rebekah  a,d      Yes
Chase    c,d,e,f  Yes

请注意，当第3列是第1列中具有该特定值的第一行时，它会显示“是”。第三行是没有必要的，我只是使用它，认为我会尝试使用if和for语句来完成所有操作，但我认为这样效率会很低。有没有办法让这项工作有效？

Answer 1

另一种选择是（使用@bgoldst提到的数据）

library('dplyr')

out = df %>% 
      group_by(a) %>% 
      summarize(b = paste(unique(c(b)), collapse=","), c = "yes")

#> out
#Source: local data frame [3 x 3]

#        a       b   c
#1   Chase c,d,e,f yes
#2 Rebekah     a,d yes
#3    john   a,b,c yes

使用data.table

out = setDT(df)[, .(b = paste(unique(b),  collapse=','), c = "yes"), by = .(a)]

#> out
#         a       b   c
#1:    john   a,b,c yes
#2: Rebekah     a,d yes
#3:   Chase c,d,e,f yes

Answer 2

您可以使用by()执行此操作：

df <- data.frame(a=c('john','john','john','Rebekah','Rebekah','Chase','Chase','Chase','Chase'), b=c('a','b','c','a','d','c','d','e','f'), c=c('Yes','No','No','Yes','No','Yes','No','No','No'), stringsAsFactors=F );
do.call(rbind,by(df,df$a,function(x) data.frame(a=x$a[1],b=paste0(x$b,collapse=','),c=x$c[1],stringsAsFactors=F)));
##               a       b   c
## Chase     Chase c,d,e,f Yes
## john       john   a,b,c Yes
## Rebekah Rebekah     a,d Yes

修改：这是另一种方法，使用tapply()使用独立聚合：

key <- unique(df$a);
data.frame(a=key,b=tapply(df$b,df$a,paste,collapse=',')[key],c=tapply(df$c,df$a,`[`,1)[key]);
##               a       b   c
## john       john   a,b,c Yes
## Rebekah Rebekah     a,d Yes
## Chase     Chase c,d,e,f Yes

修改还有另一种方法，merge()几个aggregate()来电的结果：

merge(aggregate(b~a,df,paste,collapse=','),aggregate(c~a,df,`[`,1));
##         a       b   c
## 1   Chase c,d,e,f Yes
## 2    john   a,b,c Yes
## 3 Rebekah     a,d Yes

在R中折叠列

2 个答案: