我想bind_rows两个数据帧。我尝试了很多解决方案,但没有人给我一个好的结果。
df1< -
id | data1 | data2 | data3 | data4
1 | pl-lo | pl | lo | lo
2 | lo | st-lo | pl | pl
3 | pl | lo | |
4 | | | pl-pa | lo
5 | st-lo | pl | st | pl-lo
DF2< -
id | data1 | data2 | data3 | data4 | data5
1 | pl | lo | st | pl | pl
6 | pl | | pl | pl | st
7 | lo | lo | lo | |
4 | | | | lo |
3 | st | pl | st | pl | pl
我想获得此输出
id | data1 | data2 | data3 | data4 | data5
1 | pl-lo-pl| pl-lo | lo-st | lo-pl | pl
2 | lo | st-lo | pl | pl |
3 | pl-st | lo-pl | st | pl | pl
4 | | | pl-pa | lo-lo |
6 | pl | | pl | pl | st
7 | lo | lo | lo | |
我试图这样做,但它给了我错误的输出
outpud <<- bind_rows (df1, df2) %>%
group_by(id) %>%
summarise_if(.,is.character,funs(paste0(.,collapse = "-" )))
答案 0 :(得分:1)
假设您的空值为NA,则可以在mutate_all
后使用summarise_if
删除连接的NA。
bind_rows (df1, df2) %>%
group_by(id) %>%
summarise_if(.,is.character, funs(paste0(.,collapse = "-" ))) %>%
mutate_all(funs(stringr::str_replace_all(., "NA-|-NA|NA", "")))
# A tibble: 7 x 6
id data1 data2 data3 data4 data5
<chr> <chr> <chr> <chr> <chr> <chr>
1 1 pl-lo-pl pl-lo lo-st lo-pl pl
2 2 lo st-lo pl pl ""
3 3 pl-st lo-pl st pl pl
4 4 "" "" pl-pa lo-lo ""
5 5 st-lo pl st pl-lo ""
6 6 pl "" pl pl st
7 7 lo lo lo "" ""
答案 1 :(得分:0)
这应该有效。
library(dplyr)
library(tidyr)
col1 <- colnames(mtcars)
col2 <- colnames(mtcars)
mtcars3 <- mtcars %>%
mutate(names = rownames(mtcars)) %>%
group_by(names)
#remove one to show what happens when you have a missing level
mtcars5 <- filter(mtcars3, names != "Mazda RX4") %>%
arrange(names)
#keeps only the same in both dataframes
mtcars7 <- inner_join(mtcars3,select(mtcars5,names), by='names') %>%
arrange(names)
#paste columns together
mtcars9 <- as.data.frame(mapply(paste,mtcars5[col1], mtcars7[col2], MoreArgs = list(sep="-")),stringsAsFactors = FALSE)
#rename names (also duplicates)
mtcars9$names <- mtcars5$names
#get solo column that was not in both dfs
mtcars10 <- anti_join(mtcars3,mtcars5, by='names')
#bind on
mtcars11 <- bind_rows(mtcars9,sapply(mtcars10,as.character))