仅选择2个不同数据框中的匹配列R

时间:2019-12-11 12:52:51

标签: r

我在第一DF中有106列,在第二DF中有97列,我想将它们合并。为此,我需要在两个DF中都具有相同的列。

那么我如何才能达到以下要求(在下面列出)。

DF1 :column names are A,B,C & D 
DF2 :column names A,B & E.

可以选择以下数据框中的列组合吗?

1) Match in both i.e A & B 
2) Extras in 2nd i.e E
3) Extras in first i.e C & D

我用select()等在dplyr中尝试了colnames(df1) == colnames(df2)等其他方式,但尝试了其他各种可能性,但没有获得成功。

下面是Dataframe1:

[1] "ï..Lan.ID"                 "NBFC"                      "Application.ID"           
  [4] "Region"                    "Loan.City"                 "Loan.Type"                
  [7] "Loan.Scheme"               "Name"                      "Mobile.Number"            
 [10] "Loan.Status"               "Principal.Outstanding"     "Last.EMI"                 
 [13] "Next.EMI"                  "Next.Bullet.Month"         "Next.Bullet.Amount"       
 [16] "Sum.Instalment.Posted"     "Dues.Receipts"             "EMI.Due"                  
 [19] "All.Dues"                  "Instalment.Dues"           "Bullets.Overdue"          
 [22] "Loan.Quality"              "Sanctioned.Amount"         "Loan.Amount"              
 [25] "Tenure"                    "Completed.Tenure"          "Tenure.Left"              
 [28] "Personal.Email"            "Official.Email"            "No..Of.Late.Payments"     
 [31] "CRIF.Score"                "CIBIL.Score"               "No.of.Actions"            
 [34] "Fixed.Income"              "ECS.Customer.Name"         "ECS.Bank.Name"            
 [37] "ECS.Account.Number"        "Loan.Date"                 "Sanction.Month"           
 [40] "EMI.Start.Date"            "X1st.EMI.Month"            "End.Date"                 
 [43] "Home.Address"              "Permanent.Address"         "Employer.Name"            
 [46] "Company.MCA.ID"            "Business.Address"          "Reference.Details"        
 [49] "Nature.of.Business"        "Pan.Card"                  "Aadhar.UID"               
 [52] "Gender"                    "Educational.Qualification" "DOB"                      
 [55] "Marital.Status"            "Last.Payment.Date"         "Job.Type"                 
 [58] "Employment.Year"           "Cycle.Date"                "Age"                      
 [61] "relevant_pos"              "crif_active_accounts"      "crif_overdue_amt"         
 [64] "crif_current_outstanding"  "cibil_active_accounts"     "cibil_overdue_amt"        
 [67] "cibil_current_outstanding" "NACH.Status"               "Awarenss.Allocation"      
 [70] "Allocation.Date"           "Awareness.Data"            "Awareness.Brk.up"         
 [73] "Dec.19.EMI.Amount"         "Tenure.End"                "Dec.19.BKt"               
 [76] "DPD"                       "New.DPD"                   "DPD.Range.New"            
 [79] "New.Amount.Due"            "New.Total.Due"             "Loan.Slabs"               
 [82] "Last.Month.Bnc"            "X1st.EMI"                  "Dec.19.Bnc"               
 [85] "Dec.19.Non.Starter"        "Reason.of.Bnc"             "HNI"                      
 [88] "EMI.Due.1"                 "OS"                        "Advance.Paid"             
 [91] "Paid.Unpaid"               "Not.Allocated"             "Excess"                   
 [94] "CC.Take.Over...OD"         "Last.Month.delinq"         "Loan.Status.1"            
 [97] "CIBIL.Bracket"             "Salary.Bracket"            "DPD.1"                    
[100] "Reason.of.Default"         "Contactibility"            "Delinq"                   
[103] "PayTm.Industry"            "Industry"                  "Employer.Name.1"          
[106] "DELINQ.NON.DELINQ"

数据框2:

[1] "ï..Lan.ID"                 "NBFC"                      "Application.ID"           
 [4] "Region"                    "Loan.City"                 "Loan.Type"                
 [7] "Loan.Scheme"               "Name"                      "Mobile.Number"            
[10] "Loan.Status"               "Principal.Outstanding"     "Last.EMI"                 
[13] "Next.EMI"                  "Next.Bullet.Month"         "Next.Bullet.Amount"       
[16] "Sum.Instalment.Posted"     "Dues.Receipts"             "EMI.Due"                  
[19] "All.Dues"                  "Instalment.Dues"           "Bullets.Overdue"          
[22] "Loan.Quality"              "Sanctioned.Amount"         "Loan.Amount"              
[25] "Tenure"                    "Completed.Tenure"          "Tenure.Left"              
[28] "Personal.Email"            "Official.Email"            "No..Of.Late.Payments"     
[31] "CRIF.Score"                "CIBIL.Score"               "No.of.Actions"            
[34] "Fixed.Income"              "ECS.Customer.Name"         "ECS.Bank.Name"            
[37] "ECS.Account.Number"        "Loan.Date"                 "Sanction.Month"           
[40] "EMI.Start.Date"            "X1st.EMI.Month"            "End.Date"                 
[43] "Home.Details"              "Permanent.Address.Details" "Employer.Name"            
[46] "Company.MCA.ID"            "Business.Details"          "Reference.Details"        
[49] "Nature.of.Business"        "Pan.Card"                  "Aadhar.UID"               
[52] "Gender"                    "Educational.Qualification" "DOB"                      
[55] "Marital.Status"            "Last.Payment.Date"         "Job.Type"                 
[58] "Employment.Year"           "Cycle.Date"                "Age"                      
[61] "relevant_pos"              "crif_active_accounts"      "crif_overdue_amt"         
[64] "crif_current_outstanding"  "cibil_active_accounts"     "cibil_overdue_amt"        
[67] "cibil_current_outstanding" "NACH.status"               "Awarenss.Allocation"      
[70] "Allocation.Date"           "Awareness.Data"            "Awareness.Brk.up"         
[73] "June.19.EMI.Amount"        "Tenure.End"                "June.BKt"                 
[76] "Loan.Slabs"                "Last.Month.Bnc"            "X1st.EMI"                 
[79] "June.19.Bnc"               "June.19.Non.Starter"       "Reason.of.Bnc"            
[82] "HNI"                       "EMI.Due.1"                 "OS"                       
[85] "Advance.Paid"              "PAID.Unpaid"               "Not.Allocated"            
[88] "Excess"                    "DPD"                       "CC.Take.Over"             
[91] "Last.Month.delinq"         "Loan.Status.1"             "CIBIL.Bracket"            
[94] "Salary.Bracket"            "DPD.1"                     "DELINQ.NON.DELINQ"        
[97] "Month"

此处的预期结果将是两个DF中匹配列的名称和未匹配列的名称

1 个答案:

答案 0 :(得分:1)

我认为Sotos的评论为您的问题提供了最优雅的输出。

不过,您也可以使用%in%

O1 = colnames(dfA)[colnames(dfA) %in% colnames(dfB)]

> O1
[1] "A" "B" "C"

但是,关于您的匹配条件2)和3),这有点令人困惑,因为当您要求时:

  

2)在第二和第二方面都相同,即A,B和E

我认为它对应于第二个数据集(colnames(dfB))中的所有列

  

3)首先在A,B,C和D中都存在,并且在其他方​​面都相同

这对应于第一个数据集(colnames(dfA))中的所有列

这对您有意义吗?我是否错过了您的合并模式中的某些内容?

数据

dfA = data.frame(matrix(sample(1:100, 16), ncol = 4, nrow = 4))
colnames(dfA) = LETTERS[1:4]

dfB = data.frame(matrix(sample(1:100, 16), ncol = 4, nrow = 4))
colnames(dfB) = LETTERS[c(1:3,5)]

> dfA
   A  B  C  D
1 75 66 17 89
2 46  7 27 38
3 97 26 47 31
4 32 20 71  2

> dfB
   A  B  C  E
1 94 70 18 16
2 69 57 29 60
3 53 50 25 96
4 37 51 64 75