我有很多数据集,我想将它们合并并使其唯一。我正在尝试在此处制作代表性数据
df1 <- read.table(text="info var1 var2
1 C001 mytest1 NA
2 C002 mytest2 NA
3 C003 myse1 data1
4 C004 NA NA
5 C007 where1 India
6 C010 ohio city
11 C016 number fifty
12 C017 city rome", header=T, stringsAsFactors=F)
and this
df2 <- read.table(text="info var1 var2
1 C003 myse1 data1
2 C007 where1 India
3 C010 ohio city
4 C016 number fifty
5 C017 city rome
6 C022 country India
7 C023 number 10", header=T, stringsAsFactors=F)
df3 <- read.table(text="info var1 var2 var3
1 C017 city rome ind
2 C022 country India bes
3 C027 this there NA", header=T, stringsAsFactors=F)
我想基于 info 将它们全部组合在一起,但要使其独特。 当我想合并所有文件时,我会这样做
library(tidyverse)
library(dplyr)
list(df1, df2, df3) %>% reduce(full_join, by = "info")
但是我希望输出像这样
info var1.x var2.x var3
C001 mytest1 NA NA
C002 mytest2 NA NA
C003 myse1 data1 NA
C004 NA NA NA
C007 where1 India NA
C010 ohio city NA
C016 number fifty NA
C017 city rome ind
C022 country India bes
C023 number 10 NA
C027 this there NA
答案 0 :(得分:1)
我认为这应该对您有用。
bind_rows(df1, df2, df3) %>%
unique() %>%
mutate(rsum = rowSums(!is.na(.))) %>%
group_by(info) %>%
filter(rsum == max(rsum)) %>%
select(-rsum)
info var1 var2 var3
<chr> <chr> <chr> <chr>
1 C001 mytest1 <NA> <NA>
2 C002 mytest2 <NA> <NA>
3 C003 myse1 data1 <NA>
4 C004 <NA> <NA> <NA>
5 C007 where1 India <NA>
6 C010 ohio city <NA>
7 C016 number fifty <NA>
8 C023 number 10 <NA>
9 C017 city rome ind
10 C022 country India bes
11 C027 this there <NA>
答案 1 :(得分:0)
以下解决方案首先生成您的唯一键,您将通过这些键将数据集合并在一起,即共享的“信息”列。然后使用左联接合并添加来自df1和df2中var1,df1和df2中var2以及df3中var3的各个列
library(dplyr)
info <- data.frame(info=unique(c(df1$info,df2$info,df3$info)))
var1s <- unique(rbind(df1[,c("info","var1")],df2[,c("info","var1")],df3[,c("info","var1")]))
var2s <- unique(rbind(df1[,c("info","var2")],df2[,c("info","var2")],df3[,c("info","var2")]))
var3s <- unique(df3[,c("info","var3")])
merge(x=info,y=var1s,by="info",all.x=T) %>% merge(y=var2s,by="info",all.x=T) %>% merge(y=var3s,by="info",all.x=T)
结果:
> merge(x=info,y=var1s,by="info",all.x=T) %>% merge(y=var2s,by="info",all.x=T) %>% merge(y=var3s,by="info",all.x=T)
info var1 var2 var3
1 C001 mytest1 <NA> <NA>
2 C002 mytest2 <NA> <NA>
3 C003 myse1 data1 <NA>
4 C004 <NA> <NA> <NA>
5 C007 where1 India <NA>
6 C010 ohio city <NA>
7 C016 number fifty <NA>
8 C017 city rome ind
9 C022 country India bes
10 C023 number 10 <NA>
11 C027 this there <NA>