我有四列Associated_Doll1:Associated_Doll4
Associated_Doll1 | Associated_Doll2 | Associated_Doll3 | Associated _Doll4
Doll_Hair Doll_hand Doll_body Doll_Leg
RED WHITE NA NA
NA NA Apple Orange
已实现输出列
Doll_Hair,Doll_Hand,Doll_body,Doll_leg
RED,WHITE
Apple,Orange
代码:
for(i in 1:length(B$Associated_Doll1))
{
B$Doll[i]<-paste(na.omit(c(B$Associated_Doll1[i],
B$Associated_Doll2[i],
B$Associated_Doll3[i],
B$Associated_Doll4[i],
B$Associated_Doll5[i])),collapse = ",")
}
B$Doll <- gsub(",NA,",",",B$Doll)
B$Doll <- gsub(",NA","",B$Doll)
B$Doll <- gsub("NA,","",B$Doll)
对于大约1000行的小数据集,上面的代码工作得非常快,但我希望在大数据集的速度方面做得更好(1000000 - 用10列观察) 如何即兴发挥? 请建议
答案 0 :(得分:1)
您可以执行以下操作(感谢@JanLauGe获取示例df):
df <- data.frame(
hair = c('RED', NA),
hand = c('WHITE', NA),
body = c(NA, 'Apple'),
leg = c(NA, 'Orange'))
df$totals <- apply(df, 1, function(x) paste(na.omit(x), collapse = ","))
> df
hair hand body leg totals
1 RED WHITE <NA> <NA> RED,WHITE
2 <NA> <NA> Apple Orange Apple,Orange
答案 1 :(得分:0)
stringr
包是您的朋友:
library(tidyverse)
library(stringr)
df <- data_frame(
hair = c('RED', NA),
hand = c('WHITE', NA),
body = c(NA, 'Apple'),
leg = c(NA, 'Orange')
)
df %>%
# Replace NA's with empty strings
mutate_all(funs(str_replace_na(., replacement = ''))) %>%
# Create new, joined column
mutate(joined = str_c(hair, hand, body, leg))