在R代码中将多个列粘贴到一列时删除不需要的逗号

时间:2017-07-13 09:38:19

标签: r

我有四列Associated_Doll1:Associated_Doll4

Associated_Doll1 | Associated_Doll2 | Associated_Doll3 | Associated _Doll4

 Doll_Hair         Doll_hand           Doll_body               Doll_Leg
 RED               WHITE                  NA                    NA
 NA                NA                    Apple                 Orange

已实现输出列

Doll_Hair,Doll_Hand,Doll_body,Doll_leg
RED,WHITE
Apple,Orange

代码:

for(i in 1:length(B$Associated_Doll1))
{
B$Doll[i]<-paste(na.omit(c(B$Associated_Doll1[i],
B$Associated_Doll2[i],
B$Associated_Doll3[i],
B$Associated_Doll4[i],
B$Associated_Doll5[i])),collapse = ",")
}

B$Doll <- gsub(",NA,",",",B$Doll)

B$Doll <- gsub(",NA","",B$Doll)

B$Doll <- gsub("NA,","",B$Doll)

对于大约1000行的小数据集,上面的代码工作得非常快,但我希望在大数据集的速度方面做得更好(1000000 - 用10列观察) 如何即兴发挥? 请建议

2 个答案:

答案 0 :(得分:1)

您可以执行以下操作(感谢@JanLauGe获取示例df):

df <- data.frame(
     hair = c('RED', NA),
     hand = c('WHITE', NA),
     body = c(NA, 'Apple'),
     leg = c(NA, 'Orange'))

df$totals <- apply(df, 1, function(x) paste(na.omit(x), collapse = ","))

> df
  hair  hand  body    leg       totals
1  RED WHITE  <NA>   <NA>    RED,WHITE
2 <NA>  <NA> Apple Orange Apple,Orange

答案 1 :(得分:0)

stringr包是您的朋友:

library(tidyverse)
library(stringr)
df <- data_frame(
  hair = c('RED', NA),
  hand = c('WHITE', NA),
  body = c(NA, 'Apple'),
  leg = c(NA, 'Orange')
)

df %>%
  # Replace NA's with empty strings
  mutate_all(funs(str_replace_na(., replacement = ''))) %>%
  # Create new, joined column
  mutate(joined = str_c(hair, hand, body, leg))