过滤具有多个条件的数据帧

时间:2016-07-16 13:20:24

标签: r filter

我需要更新一个有1000行的电子表格,但是有问题。

我有两个数据集:

DF

CompanyID1      TMC1
ABC company     QBT
BCD company     G W TMC
jb hi fi        QBT
ABC company     GW TMC
FB Company     AMEX
LL company     AMEX
j k             QBT
k. l company    TP oil
1 to 1 lts      TP oil
2 in 1 pty ltd.  AMEX

DF2

DRA CompanyID2          TMC2    Status
11  2 in 1 pty ltd.     AMEX    sent
12  1 to 1 lts          TP oil  produce
13  BCD company         ACE     sent
14  k. l company        TP oil  sent
15  jb hi fi             QBT    produce
16  ABC company          QBT    sent
17  j k                  QBT    sent
18  FB Company           AMEX   sent
19  facebook pty         QBT    sent
20  2 in 1 pty ltd.     AMEX    produce

我想要实现的是首先在df$CompanyID1中找到df2$CompanyID2值,如果匹配,那么如果df$TMC1匹配df2$TMC2,那么它必须{ {1}}然后在df2$status=='sent'中创建一个新列并返回df$new值;如果df2$DRA然后df2$status=='produce'应该有'删除'

示例

来自df$new

“ABC公司”存在于df1$CompanyID1中。 ABC公司的df2$CompanyID2匹配df$TMC1df2$TMC2。因此,df2$status=='sent'

我将非常感谢你的帮助。这将节省大量时间,我可以将其用于其他生产目的。感谢

dput(DF1)

df$new <- 16

dput(DF2)

structure(list(Company.ID1 = structure(c(3L, 4L, 7L, 3L, 5L, 
9L, 6L, 8L, 1L, 2L), .Label = c("1 to 1 lts", "2 in 1 pty ltd.", 
"ABC company", "BCD company", "FB Company", "j k ", "jb hi fi", 
"k. l company", "LL company"), class = "factor"), TMC1 = structure(c(4L, 
2L, 4L, 3L, 1L, 1L, 4L, 5L, 5L, 1L), .Label = c("AMEX", "G W TMC", 
"GW TMC", "QBT", "TP oil"), class = "factor")), .Names = c("Company.ID1", 
"TMC1"), class = "data.frame", row.names = c(NA, -10L))

structure(list(DRA = 11:20, Company.ID2 = structure(c(2L, 1L, 
4L, 9L, 8L, 3L, 7L, 6L, 5L, 2L), .Label = c("1 to 1 lts", "2 in 1 pty ltd.", 
"ABC company", "BCD company", "facebook pty", "FB Company", "j k ", 
"jb hi fi", "k. l company"), class = "factor"), TMC2 = structure(c(2L, 
4L, 1L, 4L, 3L, 3L, 3L, 2L, 3L, 2L), .Label = c("ACE", "AMEX", 
"QBT", "TP oil"), class = "factor"), Status = structure(c(2L, 
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("produce", "sent"
), class = "factor")), .Names = c("DRA", "Company.ID2", "TMC2", 
"Status"), class = "data.frame", row.names = c(NA, -10L))

for (i in 1:nrow(df1)) { if(df1$Company.ID1[i]==df2$Company.ID2[i] & df1$TMC1[i]==df2$TMC2[i] & df2$Status[i]=='sent') data1$new[i]<- 'sent' }else{ data1$new<- 'delete'} df1$Company.ID1的{​​{1}}个公司可能有多个公司,但它们也可能位于不同的行中。

我的预期输出如下:

  1. 将公司x名称从df2$Company.ID2df1$Company.ID1
  2. 匹配
  3. 如果匹配,请检查公司x的df2$Company.ID2匹配data1$TMC1
  4. 如果1&amp; 2为TRUE,然后从df2df2$TMC2
  5. 检查公司x的状态
  6. 如果为TRUE,则创建一个新列df1 $ new并获取DRA编号df2$Status=='sent'并将其存储为该公司x
  7. 谢谢

3 个答案:

答案 0 :(得分:1)

这是一种合并和识别方法:

#Merge data on ID and TMC columns
m <- merge(df2, df, by.x=c("CompanyID2", "TMC2"),
      by.y=c("CompanyID1", "TMC1"))

#If "sent" use DRA, if not "delete"
m$Output <- ifelse(m$Status == "sent", as.character(m$DRA), "delete")

#Remove unnecessary columns
m[-(3:4)]
#   CompanyID2 TMC2 Output
# 1        ABC  QBT     16
# 2        BCD  ACE     13
# 3         jb  QBT delete

答案 1 :(得分:1)

我们可以使用dplyr

library(dplyr)
inner_join(df2, df1, by = c("CompanyID2" = "CompanyID1", "TMC2" = "TMC1")) %>%
      mutate(Output = ifelse(Status == "sent", DRA, "delete"))

答案 2 :(得分:1)

使用calc()的另一个人:

sqldf