R:如何“聚合”(或组合)字符列?

时间:2015-12-30 22:26:03

标签: r

我有一个有三列的df。每列都有一个字符或NA,每行只有一个字符。就像这个例子:

df <- data.frame(a=c("NA","NA","NA","NA","fruits","fruits","fruits","fruits","fruits","fruits"), 
                 b=c("NA","NA","veggies","veggies","NA","NA","NA","NA","NA","NA"),
                 c=c("nuts","nuts","NA","NA","NA","NA","NA","NA","NA","NA") )

我想要将所有三列合并,以获得此结果:

1     nuts
2     nuts
3  veggies
4  veggies
5   fruits
6   fruits
7   fruits
8   fruits
9   fruits
10  fruits

使用数值我会aggregate使用na.rm=TRUE。但是,我不知道如何用字符来做这件事。想法?谢谢

3 个答案:

答案 0 :(得分:1)

我们可以在将字符串“NA”转换为真实max.col后使用NA。我们使用max.col获取行/列索引,提取值,然后转换为data.frame

is.na(df) <- df=='NA'
data.frame(var=df[cbind(1:nrow(df),max.col(!is.na(df)))])
#      var
#1     nuts
#2     nuts
#3  veggies
#4  veggies
#5   fruits
#6   fruits
#7   fruits
#8   fruits
#9   fruits
#10  fruits

或另一种选择是

data.frame(var= df[cbind(1:nrow(df),(+!is.na(df)) %*% seq_along(df))])

答案 1 :(得分:0)

要完善评论中提供的想法,您可以这样做:

data.frame(var = apply(df, 1, function(x) paste(gsub("NA", "", x), collapse = "")) )

      var
1     nuts
2     nuts
3  veggies
4  veggies
5   fruits
6   fruits
7   fruits
8   fruits
9   fruits
10  fruits

答案 2 :(得分:0)

实际数据情况可能决定是否比逐行方法更好或更差。这是获得打印输出的一种方式,如您指定的那样:

> as.matrix( df[df!="NA"] )

或者可能更好:

> cat( paste( "\n", df[ df!="NA" ] ) )

 fruits 
 fruits 
 fruits 
 fruits 
 fruits 
 fruits 
 veggies 
 veggies 
 nuts 
 nuts