我需要更新一个有1000行的电子表格,但是有问题。
我有两个数据集:
DF
CompanyID1 TMC1
ABC company QBT
BCD company G W TMC
jb hi fi QBT
ABC company GW TMC
FB Company AMEX
LL company AMEX
j k QBT
k. l company TP oil
1 to 1 lts TP oil
2 in 1 pty ltd. AMEX
DF2
DRA CompanyID2 TMC2 Status
11 2 in 1 pty ltd. AMEX sent
12 1 to 1 lts TP oil produce
13 BCD company ACE sent
14 k. l company TP oil sent
15 jb hi fi QBT produce
16 ABC company QBT sent
17 j k QBT sent
18 FB Company AMEX sent
19 facebook pty QBT sent
20 2 in 1 pty ltd. AMEX produce
我想要实现的是首先在df$CompanyID1
中找到df2$CompanyID2
值,如果匹配,那么如果df$TMC1
匹配df2$TMC2
,那么它必须{ {1}}然后在df2$status=='sent'
中创建一个新列并返回df$new
值;如果df2$DRA
然后df2$status=='produce'
应该有'删除'
示例
来自df$new
的“ABC公司”存在于df1$CompanyID1
中。 ABC公司的df2$CompanyID2
匹配df$TMC1
和df2$TMC2
。因此,df2$status=='sent'
我将非常感谢你的帮助。这将节省大量时间,我可以将其用于其他生产目的。感谢
dput(DF1)
df$new <- 16
dput(DF2)
structure(list(Company.ID1 = structure(c(3L, 4L, 7L, 3L, 5L,
9L, 6L, 8L, 1L, 2L), .Label = c("1 to 1 lts", "2 in 1 pty ltd.",
"ABC company", "BCD company", "FB Company", "j k ", "jb hi fi",
"k. l company", "LL company"), class = "factor"), TMC1 = structure(c(4L,
2L, 4L, 3L, 1L, 1L, 4L, 5L, 5L, 1L), .Label = c("AMEX", "G W TMC",
"GW TMC", "QBT", "TP oil"), class = "factor")), .Names = c("Company.ID1",
"TMC1"), class = "data.frame", row.names = c(NA, -10L))
structure(list(DRA = 11:20, Company.ID2 = structure(c(2L, 1L,
4L, 9L, 8L, 3L, 7L, 6L, 5L, 2L), .Label = c("1 to 1 lts", "2 in 1 pty ltd.",
"ABC company", "BCD company", "facebook pty", "FB Company", "j k ",
"jb hi fi", "k. l company"), class = "factor"), TMC2 = structure(c(2L,
4L, 1L, 4L, 3L, 3L, 3L, 2L, 3L, 2L), .Label = c("ACE", "AMEX",
"QBT", "TP oil"), class = "factor"), Status = structure(c(2L,
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("produce", "sent"
), class = "factor")), .Names = c("DRA", "Company.ID2", "TMC2",
"Status"), class = "data.frame", row.names = c(NA, -10L))
但for (i in 1:nrow(df1))
{
if(df1$Company.ID1[i]==df2$Company.ID2[i] & df1$TMC1[i]==df2$TMC2[i] & df2$Status[i]=='sent')
data1$new[i]<- 'sent'
}else{ data1$new<- 'delete'}
中df1$Company.ID1
的{{1}}个公司可能有多个公司,但它们也可能位于不同的行中。
我的预期输出如下:
df2$Company.ID2
与df1$Company.ID1
df2$Company.ID2
匹配data1$TMC1
df2df2$TMC2
df2$Status=='sent'
并将其存储为该公司x 谢谢
答案 0 :(得分:1)
这是一种合并和识别方法:
#Merge data on ID and TMC columns
m <- merge(df2, df, by.x=c("CompanyID2", "TMC2"),
by.y=c("CompanyID1", "TMC1"))
#If "sent" use DRA, if not "delete"
m$Output <- ifelse(m$Status == "sent", as.character(m$DRA), "delete")
#Remove unnecessary columns
m[-(3:4)]
# CompanyID2 TMC2 Output
# 1 ABC QBT 16
# 2 BCD ACE 13
# 3 jb QBT delete
答案 1 :(得分:1)
我们可以使用dplyr
library(dplyr)
inner_join(df2, df1, by = c("CompanyID2" = "CompanyID1", "TMC2" = "TMC1")) %>%
mutate(Output = ifelse(Status == "sent", DRA, "delete"))
答案 2 :(得分:1)
使用calc()
的另一个人:
sqldf