根据条件集

时间:2017-04-21 14:31:31

标签: r

在df1中,我需要将msec的值替换为df2中的相应值。

df1 <- data.frame(ID=c('rs', 'rs', 'rs', 'tr','tr','tr'), cond=c(1,1,2,1,1,2), 
block=c(2,2,4,2,2,4), correct=c(1,0,1,1,1,0), msec=c(456,678,756,654,625,645))

df2 <- data.frame(ID=c('rs', 'rs', 'tr','tr'), cond=c(1,2,1,2), 
block=c(2,4,2,4), mean=c(545,664,703,765))

在df1中,如果correct==0,则引用df2,其匹配值为IDcondblock。将msecdf1的值替换为meandf2的相应值。

例如,df1中的第二行有correct==0。因此,在df2中找到ID=='rs'cond==1block==2的相应行,并使用mean(mean=545)的值替换msec的值( msec=678)。请注意,在df1中,ID,block和cond的组合可以重复,但每个组合在df2中只出现一次。

3 个答案:

答案 0 :(得分:3)

使用data.table包:

# load the 'data.table' package
library(data.table)

# convert the data.frame's to data.table's
setDT(df1)
setDT(df2)

# update df1 by reference with a join with df2
df1[df2[, correct := 0], on = .(ID, cond, block, correct), msec := i.mean]

给出:

> df1
   ID cond block correct msec
1: rs    1     2       1  456
2: rs    1     2       0  545
3: rs    2     4       1  756
4: tr    1     2       1  654
5: tr    1     2       1  625
6: tr    2     4       0  765

注意:上述代码将更新df1,而不是创建新的数据帧,这样可以提高内存效率。

答案 1 :(得分:2)

一种选择是将基数R与interaction()match()一起使用。怎么样:

df1[which(df1$correct==0),"msec"] <- df2[match(interaction(df1[which(df1$correct==0),c("ID","cond","block")]), 
                                               interaction(df2[,c("ID","cond", "block")])),
                                         "mean"]

df1
#        ID cond block correct msec
#1 rs    1     2       1  456
#2 rs    1     2       0  545
#3 rs    2     4       1  756
#4 tr    1     2       1  654
#5 tr    1     2       1  625
#6 tr    2     4       0  765 

我们会在correct == 0

中覆盖匹配行的df2$mean

编辑:另一种选择是sql合并,它可能如下所示:

library(sqldf)
merged <- sqldf('SELECT l.ID, l.cond, l.block, l.correct,
                        case when l.correct == 0 then r.mean else l.msec end as msec
                FROM df1 as l
                LEFT JOIN df2 as r
                ON l.ID = r.ID AND l.cond = r.cond AND l.block = r.block')


merged
  ID cond block correct msec
1 rs    1     2       1  456
2 rs    1     2       0  545
3 rs    2     4       1  756
4 tr    1     2       1  654
5 tr    1     2       1  625
6 tr    2     4       0  765

答案 2 :(得分:1)

dplyr。此解决方案left_join所有列和正确时mutate为0。

library(dplyr)
left_join(df1,df2)%>%
mutate(msec=ifelse(correct==0,mean,msec))%>%
select(-mean)

  ID cond block correct msec
1 rs    1     2       1  456
2 rs    1     2       0  545
3 rs    2     4       1  756
4 tr    1     2       1  654
5 tr    1     2       1  625
6 tr    2     4       0  765