从多组列中查找值

时间:2015-11-02 20:15:53

标签: r dplyr lookup

这是excel中的VOOLKUP问题。我有一个如下数据集。

dat1 <- read.table(header=TRUE, text="
ID  Name1   Name2
1384    Rem_Ps  Tel_Nm
1442    Teq_Ls  Sel_Nm
1340    Fem_Bs  Tem_Mn
1419    Few_Bn  Ten_Gf
1359    Fem_Bs  Tem_Mn
1237    Qwl_Po  Mnt_Pj
1288    Tem_na  Tem_Rt
1261    Sem_Na  Tel_Tr
1382    Rem_Ps  Tel_Nm
1316    Fem_Bs  Tem_Mn
1279    Sem_Na  Yem_Rt
1366    Sel_Ve  Mkl_Po
1269    Rem_Ps  Tel_Nm

                   ")
dat1
     ID  Name1  Name2
1  1384 Rem_Ps Tel_Nm
2  1442 Teq_Ls Sel_Nm
3  1340 Fem_Bs Tem_Mn
4  1419 Few_Bn Ten_Gf
5  1359 Fem_Bs Tem_Mn
6  1237 Qwl_Po Mnt_Pj
7  1288 Tem_na Tem_Rt
8  1261 Sem_Na Tel_Tr
9  1382 Rem_Ps Tel_Nm
10 1316 Fem_Bs Tem_Mn
11 1279 Sem_Na Yem_Rt
12 1366 Sel_Ve Mkl_Po
13 1269 Rem_Ps Tel_Nm

以上数据集将lookup来自以下数据集。查找值Name1Name2都会使用dat2七列QC1 to NC3来查找值。更多说明:如果在七列中找到Name1,并且在七列中也找到Name2,那么我们才会认为该选项有效。例如:第二行有两个值Teq_lsSel_Nm。由于找不到Teq_ls七列,我们将抛出这一行。

dat2 <- read.table(header=TRUE, text="
ID1 REQ REM QC1 QC2 QC3 QC4 NC1 NC2 NC3
AB1 1123    44ed    Fem_Bs  Ten_Gf  NA  NA  Tem_Mn  Tem_Mn  NA
AB2 123 331s    Tem_Rt  Qwl_Po  NA  Ten_Gf  NA  Tem_Mn  Mnt_Pj
AB3 123 334q    Ten_Gf  Tem_Mn  Sem_Na  Tem-Mn  Tel_Tr  NA  NA
AB4 1234    33ey    Sem_Na  NA  NA  NA  Tem_Rt  NA  Yem_Rt
AB5 13243   ed43    Rem_Ps  NA  NA  Tem_Mn  NA  Tel_Nm  NA
AB6 123 34rt    NA  Ten_Gf  NA  Sel_Ve  Mkl_Po  Tem_Rt  NA

                   ")
dat2

  ID1   REQ  REM    QC1    QC2    QC3    QC4    NC1    NC2    NC3
1 AB1  1123 44ed Fem_Bs Ten_Gf   <NA>   <NA> Tem_Mn Tem_Mn   <NA>
2 AB2   123 331s Tem_Rt Qwl_Po   <NA> Ten_Gf   <NA> Tem_Mn Mnt_Pj
3 AB3   123 334q Ten_Gf Tem_Mn Sem_Na Tem-Mn Tel_Tr   <NA>   <NA>
4 AB4  1234 33ey Sem_Na   <NA>   <NA>   <NA> Tem_Rt   <NA> Yem_Rt
5 AB5 13243 ed43 Rem_Ps   <NA>   <NA> Tem_Mn   <NA> Tel_Nm   <NA>
6 AB6   123 34rt   <NA> Ten_Gf   <NA> Sel_Ve Mkl_Po Tem_Rt   <NA>

结果就是这样。

ID  Name1   Name2   ID1 REQ REM
1384    Rem_Ps  Tel_Nm  AB5 13243   ed43
1340    Fem_Bs  Tem_Mn  AB1 1123    44ed
1359    Fem_Bs  Tem_Mn  AB1 1123    44ed
1237    Qwl_Po  Mnt_Pj  AB2 123 331s
1261    Sem_Na  Tel_Tr  AB3 123 334q
1382    Rem_Ps  Tel_Nm  AB5 13243   ed43
1316    Fem_Bs  Tem_Mn  AB1 1123    44ed
1279    Sem_Na  Yem_Rt  AB4 1234    33ey
1366    Sel_Ve  Mkl_Po  AB6 123 34rt
1269    Rem_Ps  Tel_Nm  AB5 13243   ed43

1 个答案:

答案 0 :(得分:2)

让我们在基地做:

z <- which(apply(dat1, 1, function(x) apply(dat2, 1, function(z) x[[2]] %in% z & x[[3]] %in% z)), arr.ind = TRUE)

cbind(dat1[z[,2],], dat2[z[,1],])

     ID  Name1  Name2 ID1   REQ  REM    QC1    QC2    QC3    QC4    NC1    NC2    NC3
1  1384 Rem_Ps Tel_Nm AB5 13243 ed43 Rem_Ps   <NA>   <NA> Tem_Mn   <NA> Tel_Nm   <NA>
3  1340 Fem_Bs Tem_Mn AB1  1123 44ed Fem_Bs Ten_Gf   <NA>   <NA> Tem_Mn Tem_Mn   <NA>
5  1359 Fem_Bs Tem_Mn AB1  1123 44ed Fem_Bs Ten_Gf   <NA>   <NA> Tem_Mn Tem_Mn   <NA>
6  1237 Qwl_Po Mnt_Pj AB2   123 331s Tem_Rt Qwl_Po   <NA> Ten_Gf   <NA> Tem_Mn Mnt_Pj
8  1261 Sem_Na Tel_Tr AB3   123 334q Ten_Gf Tem_Mn Sem_Na Tem-Mn Tel_Tr   <NA>   <NA>
9  1382 Rem_Ps Tel_Nm AB5 13243 ed43 Rem_Ps   <NA>   <NA> Tem_Mn   <NA> Tel_Nm   <NA>
10 1316 Fem_Bs Tem_Mn AB1  1123 44ed Fem_Bs Ten_Gf   <NA>   <NA> Tem_Mn Tem_Mn   <NA>
11 1279 Sem_Na Yem_Rt AB4  1234 33ey Sem_Na   <NA>   <NA>   <NA> Tem_Rt   <NA> Yem_Rt
12 1366 Sel_Ve Mkl_Po AB6   123 34rt   <NA> Ten_Gf   <NA> Sel_Ve Mkl_Po Tem_Rt   <NA>
13 1269 Rem_Ps Tel_Nm AB5 13243 ed43 Rem_Ps   <NA>   <NA> Tem_Mn   <NA> Tel_Nm   <NA>