如何基于一列进行合并并使它们同时具有唯一性

时间:2019-03-20 16:48:44

标签: r

我有很多数据集,我想将它们合并并使其唯一。我正在尝试在此处制作代表性数据

df1 <- read.table(text="info   var1 var2
1       C001        mytest1       NA
2       C002        mytest2       NA
3       C003  myse1        data1
4       C004        NA       NA
5       C007 where1        India
6       C010 ohio        city
11      C016 number        fifty
12      C017 city        rome", header=T, stringsAsFactors=F)

and this

df2 <- read.table(text="info   var1  var2
1      C003 myse1 data1
2      C007 where1 India
3      C010 ohio city
4      C016 number        fifty
5      C017 city        rome
6      C022 country India
7      C023 number 10", header=T, stringsAsFactors=F)

df3 <- read.table(text="info   var1  var2 var3
1      C017 city        rome  ind
2      C022 country India     bes
3      C027 this  there  NA", header=T, stringsAsFactors=F)

我想基于 info 将它们全部组合在一起,但要使其独特。 当我想合并所有文件时,我会这样做

library(tidyverse)
library(dplyr)
list(df1, df2, df3) %>% reduce(full_join, by = "info")

但是我希望输出像这样

info    var1.x  var2.x  var3
C001    mytest1 NA  NA
C002    mytest2 NA  NA
C003    myse1   data1   NA
C004    NA  NA  NA
C007    where1  India   NA
C010    ohio    city    NA
C016    number  fifty   NA
C017    city    rome    ind
C022    country India   bes
C023    number  10  NA
C027    this    there   NA

2 个答案:

答案 0 :(得分:1)

我认为这应该对您有用。

bind_rows(df1, df2, df3) %>% 
  unique() %>% 
  mutate(rsum = rowSums(!is.na(.))) %>%
  group_by(info) %>%  
  filter(rsum == max(rsum)) %>% 
  select(-rsum)

   info  var1    var2  var3 
   <chr> <chr>   <chr> <chr>
 1 C001  mytest1 <NA>  <NA> 
 2 C002  mytest2 <NA>  <NA> 
 3 C003  myse1   data1 <NA> 
 4 C004  <NA>    <NA>  <NA> 
 5 C007  where1  India <NA> 
 6 C010  ohio    city  <NA> 
 7 C016  number  fifty <NA> 
 8 C023  number  10    <NA> 
 9 C017  city    rome  ind  
10 C022  country India bes  
11 C027  this    there <NA> 

答案 1 :(得分:0)

以下解决方案首先生成您的唯一键,您将通过这些键将数据集合并在一起,即共享的“信息”列。然后使用左联接合并添加来自df1和df2中var1,df1和df2中var2以及df3中var3的各个列

library(dplyr)
info <- data.frame(info=unique(c(df1$info,df2$info,df3$info)))
var1s <- unique(rbind(df1[,c("info","var1")],df2[,c("info","var1")],df3[,c("info","var1")]))
var2s <- unique(rbind(df1[,c("info","var2")],df2[,c("info","var2")],df3[,c("info","var2")]))
var3s <- unique(df3[,c("info","var3")])
merge(x=info,y=var1s,by="info",all.x=T) %>% merge(y=var2s,by="info",all.x=T) %>% merge(y=var3s,by="info",all.x=T)

结果:

> merge(x=info,y=var1s,by="info",all.x=T) %>% merge(y=var2s,by="info",all.x=T) %>% merge(y=var3s,by="info",all.x=T)
   info    var1  var2 var3
1  C001 mytest1  <NA> <NA>
2  C002 mytest2  <NA> <NA>
3  C003   myse1 data1 <NA>
4  C004    <NA>  <NA> <NA>
5  C007  where1 India <NA>
6  C010    ohio  city <NA>
7  C016  number fifty <NA>
8  C017    city  rome  ind
9  C022 country India  bes
10 C023  number    10 <NA>
11 C027    this there <NA>