当两个df中的char col都可以作为键时,使用另一个df中col的条件更新df col值

时间:2013-02-10 13:24:11

标签: r syntax indexing dataframe

我需要有选择地更新df列中的值,具体取决于df2中列的条件满足,并提供df1的更新值,即df2列中的值。两个df都有一个col值,它的值是唯一的,而df2中的唯一值是df1中的唯一值。我尝试过的方法是在两个df中获取唯一列值,并将它们转换为行名,使用它们来定义选择索引,该索引由df2创建,然后应用于df1以进行值更新。我通过使用数字下标来定义列,并结合基于字符的行的共享键索引,使语法工作(最终!)。呼。

但是,有一个更简单,更有效,更“R”的方式来做这个比我正在尝试的方式,使用内置的,也许?我需要扩展。测试示例如下:

goo <- data.frame(Uids=c("UidD", "UidA", "UidC"), Payout=c(3,0,5), stringsAsFactors = FALSE)
moo <- data.frame(Uids=c("UidB", "UidC", "UidA", "UidD"), PayOut=0, stringsAsFactors = FALSE)
goo
  Uids Payout
1 UidD      3
2 UidA      0
3 UidC      5
moo
  Uids PayOut
1 UidB      0
2 UidC      0
3 UidA      0
4 UidD      0
# I want to update moo$Payout with the value of goo$Payout, for matching Uids,
# when goo$Payout > 0, i.e. moo[4,2] <- goo[1,2]; moo[2,2 <- goo[3,2]
rownames(goo) <- goo$Uids
rownames(moo) <- moo$Uids
#I am trying to create and apply an index based on turning uids into rownames
IndexToUpdate <- goo$Uids[goo$Payout>0]
IndexToUpdate
[1] "UidD" "UidC"
 moo[IndexToUpdate, 2] <- goo[IndexToUpdate, 2]
#this works, but is there a better way to do it?
moo
     Uids PayOut
UidB UidB      0
UidC UidC      5
UidA UidA      0
UidD UidD      3

1 个答案:

答案 0 :(得分:3)

我会将merge用于all.x = TRUE

voo <- merge(moo, goo, by = "Uids", all.x = TRUE)
voo
#   Uids PayOut.x PayOut.y
# 1 UidA        0        0
# 2 UidB        0       NA
# 3 UidC        0        5
# 4 UidD        0        3

然后ifelse

within(voo, PayOut <- ifelse(is.na(PayOut.y), PayOut.x, PayOut.y))
#   Uids PayOut.x PayOut.y PayOut
# 1 UidA        0        0      0
# 2 UidB        0       NA      0
# 3 UidC        0        5      5
# 4 UidD        0        3      3

使用data.table s同样的事情:

library(data.table)
GOO <- data.table(goo)
MOO <- data.table(moo)
setkey(GOO, Uids)
setkey(MOO, Uids)
VOO <- GOO[MOO]
VOO[, FinalPayout := ifelse(is.na(PayOut), PayOut.1, PayOut)]