使用R中的bind_rows函数连接两个字符数据帧

时间:2018-06-11 15:55:05

标签: r dplyr

我想bind_rows两个数据帧。我尝试了很多解决方案,但没有人给我一个好的结果。

df1< -

id      |   data1   |   data2   |   data3   |   data4
1       |   pl-lo   |   pl      |   lo      |   lo
2       |   lo      |   st-lo   |   pl      |   pl
3       |   pl      |   lo      |           |   
4       |           |           |   pl-pa   |   lo
5       |   st-lo   |   pl      |   st      |   pl-lo

DF2< -

id      |   data1   |   data2   |   data3   |   data4   | data5
1       |   pl      |   lo      |   st      |   pl      |   pl
6       |   pl      |           |   pl      |   pl      |   st
7       |   lo      |   lo      |   lo      |           |
4       |           |           |           |   lo      |
3       |   st      |   pl      |   st      |   pl      |   pl

我想获得此输出

id      |   data1   |   data2   |   data3   |   data4   |   data5
1       |   pl-lo-pl|   pl-lo   |   lo-st   |   lo-pl   |   pl
2       |   lo      |   st-lo   |   pl      |   pl      |   
3       |   pl-st   |   lo-pl   |   st      |   pl      |   pl
4       |           |           |   pl-pa   |   lo-lo   |
6       |   pl      |           |   pl      |   pl      |   st
7       |   lo      |   lo      |   lo      |           |

我试图这样做,但它给了我错误的输出

 outpud <<- bind_rows (df1, df2) %>%
    group_by(id) %>%
    summarise_if(.,is.character,funs(paste0(.,collapse = "-" )))

2 个答案:

答案 0 :(得分:1)

假设您的空值为NA,则可以在mutate_all后使用summarise_if删除连接的NA。

bind_rows (df1, df2) %>%
  group_by(id) %>%
  summarise_if(.,is.character, funs(paste0(.,collapse = "-" ))) %>% 
  mutate_all(funs(stringr::str_replace_all(., "NA-|-NA|NA", "")))


# A tibble: 7 x 6
  id    data1    data2 data3 data4 data5
  <chr> <chr>    <chr> <chr> <chr> <chr>
1 1     pl-lo-pl pl-lo lo-st lo-pl pl   
2 2     lo       st-lo pl    pl    ""   
3 3     pl-st    lo-pl st    pl    pl   
4 4     ""       ""    pl-pa lo-lo ""   
5 5     st-lo    pl    st    pl-lo ""   
6 6     pl       ""    pl    pl    st   
7 7     lo       lo    lo    ""    ""   

答案 1 :(得分:0)

这应该有效。

library(dplyr)
library(tidyr)

col1 <- colnames(mtcars)

col2 <- colnames(mtcars)


mtcars3 <- mtcars %>% 
          mutate(names = rownames(mtcars)) %>% 
          group_by(names)
#remove one to show what happens when you have a missing level
mtcars5 <- filter(mtcars3, names != "Mazda RX4") %>% 
            arrange(names)
#keeps only the same in both dataframes 
mtcars7 <- inner_join(mtcars3,select(mtcars5,names), by='names') %>% 
            arrange(names) 


#paste columns together
mtcars9 <- as.data.frame(mapply(paste,mtcars5[col1], mtcars7[col2], MoreArgs = list(sep="-")),stringsAsFactors = FALSE)
#rename names (also duplicates)
mtcars9$names <- mtcars5$names
#get solo column that was not in both dfs
mtcars10 <- anti_join(mtcars3,mtcars5, by='names')
#bind on
mtcars11 <- bind_rows(mtcars9,sapply(mtcars10,as.character))