Question

我有以下任务：用数据框B中相同变量的值替换数据框A中变量V1的值。接下来，我模拟数据框：

set.seed(123)

A<-data.frame(id1=sample(1:10,10),id2=sample(1:10,10),V1=rnorm(10),V2=rnorm(10))
###create dataframe B
B<-A[sample(1:10,5),1:3]
###change values to be updated in df A
B$V1<-rnorm(5)
###create a row which is not in A, to make it more interesting
B<-rbind(B,c(11,12,rnorm(1)))

现在，我提供了一个非理想的解决方案，希望使其更加简洁

temp<-left_join(A,B,by=c("id1","id2"))
temp[!is.na(temp$V1.y),"V1.x"]<-temp[!is.na(temp$V1.y),"V1.y"]

A<-temp[,setdiff(colnames(temp),"V1.y")]
colnames(A)[colnames(A) %in% "V1.x"]<-"V1"

希望避免创建时间对象并直接修改dfA。同样，该解决方案应具有可扩展性，以替换A的多列中的值。我认为类似

A[expression1,desired_cols]<-B[expression2,desired_cols]

其中expression1和expression2旨在匹配df中的索引，而desired_cols是要替换的列的名称

Answer 1

我们可以使用来自data.table的联接，并使用第二个数据集（'B'）的相应i.列更新'A'列

library(data.table)
setDT(A)[B,  V1 := i.V1, on = .(id1, id2)]

如果我们要替换多个列，请记下要替换的列

nm1 <- names(A)[3:4]
nm2 <- paste0("i.", nm1)
setDT(A)[B, (nm1) := mget(nm2), on = .(id1, id2)]

或者如果我们使用left_join，那么coalesce会更好

library(dplyr)
left_join(A, B, by = c('id1', 'id2')) %>%
        transmute(id1, id2, V1 = coalesce(V1.y, V1.x), V2)

根据第二个数据帧R

1 个答案: