将多列粘贴到单个列中,但删除所有NA,空白或重复值

时间:2018-12-19 21:09:25

标签: r dplyr

我有如下数据:

dat <- data.frame(SOURCES1 = c("123 Name, 123 Rd, City, State", 
                               "354 Name, 354 Rd, City, State",
                               NA,"",""),
                  SOURCES2 = c("","",
                               "321 Name, 321 Rd, City, State", 
                               "678 Name, 678 Rd, City, State",
                               ""),
                  SOURCES3 = c("","",NA,
                               "678 Name, 678 Rd, City, State", 
                               NA),
                  SOURCES4 = c("","","",NA,NA),
                  SOURCES5 = c("","","",NA,NA))

我正在寻找一个看起来像这样的列:

"123 Name, 123 Rd, City, State"
"354 Name, 354 Rd, City, State"
"321 Name, 321 Rd, City, State"
"678 Name, 678 Rd, City, State"
NA

1 个答案:

答案 0 :(得分:5)

将空格(coalesce)转换为""后,我们可以NA

library(tidyverse)
dat %>% 
   mutate_all(funs(na_if(as.character(.), ''))) %>% 
   transmute(SOURCE = coalesce(!!! rlang::syms(names(.))))
#                         SOURCE
#1 123 Name, 123 Rd, City, State
#2 354 Name, 354 Rd, City, State
#3 321 Name, 321 Rd, City, State
#4 678 Name, 678 Rd, City, State
#5                          <NA>   

或使用invoke中的purrr

dat %>% 
   mutate_all(funs(na_if(as.character(.), ''))) %>% 
   transmute(SOURCE = invoke(coalesce, .))
#                         SOURCE
#1 123 Name, 123 Rd, City, State
#2 354 Name, 354 Rd, City, State
#3 321 Name, 321 Rd, City, State
#4 678 Name, 678 Rd, City, State
#5                          <NA>

或者使用pnax中的base R

do.call(pmax, c(lapply(dat, function(x) replace(as.character(x), 
          x=="", NA)), na.rm = TRUE))