在df1中,我需要将msec的值替换为df2中的相应值。
df1 <- data.frame(ID=c('rs', 'rs', 'rs', 'tr','tr','tr'), cond=c(1,1,2,1,1,2),
block=c(2,2,4,2,2,4), correct=c(1,0,1,1,1,0), msec=c(456,678,756,654,625,645))
df2 <- data.frame(ID=c('rs', 'rs', 'tr','tr'), cond=c(1,2,1,2),
block=c(2,4,2,4), mean=c(545,664,703,765))
在df1中,如果correct==0
,则引用df2
,其匹配值为ID
,cond
和block
。将msec
中df1
的值替换为mean
中df2
的相应值。
例如,df1中的第二行有correct==0
。因此,在df2
中找到ID=='rs'
,cond==1
,block==2
的相应行,并使用mean(mean=545
)的值替换msec的值( msec=678
)。请注意,在df1中,ID,block和cond的组合可以重复,但每个组合在df2中只出现一次。
答案 0 :(得分:3)
使用data.table
包:
# load the 'data.table' package
library(data.table)
# convert the data.frame's to data.table's
setDT(df1)
setDT(df2)
# update df1 by reference with a join with df2
df1[df2[, correct := 0], on = .(ID, cond, block, correct), msec := i.mean]
给出:
> df1
ID cond block correct msec
1: rs 1 2 1 456
2: rs 1 2 0 545
3: rs 2 4 1 756
4: tr 1 2 1 654
5: tr 1 2 1 625
6: tr 2 4 0 765
注意:上述代码将更新df1
,而不是创建新的数据帧,这样可以提高内存效率。
答案 1 :(得分:2)
一种选择是将基数R与interaction()
和match()
一起使用。怎么样:
df1[which(df1$correct==0),"msec"] <- df2[match(interaction(df1[which(df1$correct==0),c("ID","cond","block")]),
interaction(df2[,c("ID","cond", "block")])),
"mean"]
df1
# ID cond block correct msec
#1 rs 1 2 1 456
#2 rs 1 2 0 545
#3 rs 2 4 1 756
#4 tr 1 2 1 654
#5 tr 1 2 1 625
#6 tr 2 4 0 765
我们会在correct == 0
df2$mean
列
编辑:另一种选择是sql合并,它可能如下所示:
library(sqldf)
merged <- sqldf('SELECT l.ID, l.cond, l.block, l.correct,
case when l.correct == 0 then r.mean else l.msec end as msec
FROM df1 as l
LEFT JOIN df2 as r
ON l.ID = r.ID AND l.cond = r.cond AND l.block = r.block')
merged
ID cond block correct msec
1 rs 1 2 1 456
2 rs 1 2 0 545
3 rs 2 4 1 756
4 tr 1 2 1 654
5 tr 1 2 1 625
6 tr 2 4 0 765
答案 2 :(得分:1)
dplyr
。此解决方案left_join
所有列和正确时mutate
为0。
library(dplyr)
left_join(df1,df2)%>%
mutate(msec=ifelse(correct==0,mean,msec))%>%
select(-mean)
ID cond block correct msec
1 rs 1 2 1 456
2 rs 1 2 0 545
3 rs 2 4 1 756
4 tr 1 2 1 654
5 tr 1 2 1 625
6 tr 2 4 0 765