在data.table的by参数中指定变量的不同子集或间隔

时间:2014-04-14 01:30:48

标签: r data.table

使用以下反应时间数据(为说明目的而简化):

>dt
   subject trialnum blockcode values.trialtype latency correct
1        1        1  practice        cueswitch    3020       1
2        1        1      test           cuerep    4284       1
3        1       21      test        cueswitch    2094       1
4        1       34      test           cuerep    3443       1
5        1       50      test       taskswitch    3313       1
6        2        1  practice        cueswitch    3020       1
7        2        1      test           cuerep    1109       1
8        2       21      test        cueswitch    3470       1
9        2       34      test           cuerep    2753       1
10       2       50      test       taskswitch    3321       1

我一直在使用data.table来获取连续试验子集的反应时间变量(由trialnum指定,在完整数据集中的范围从1到170):

dt1=dt[blockcode=="test" & correct==1, list(
RT1=.SD[trialnum>=1 & trialnum<=30 & values.trialtype=="cuerep", mean(latency)],
RT2=.SD[trialnum>=31 & trialnum<=60 & values.trialtype=="cuerep", mean(latency)]
), by="subject"]

输出

   subject     RT1     RT2
1:       1    4284    3443
2:       2    1109    2753

但是,当存在多于2个或3个子集时,为每个子集创建变量会变得乏味。如何更有效地指定这些子集?

1 个答案:

答案 0 :(得分:2)

使用findIntervalcuttrialnum

进行分组

一个例子

# set the key to use binary search
setkey(dt, blockcode,correct,values.trialtype)
# the subset you want
dt1 <- dt[.('test',1,'cuerepetition')]

# use cut  to define subsets

dt2 <- dt1[,list(latency = mean(latency)),
     by=list(subject, trialset = cut(trialnum,seq(0,180,by=30)))]
dt2
#    subject trialset latency
# 1:       1   (0,30]    4284
# 2:       1  (30,60]    3443
# 3:       2   (0,30]    1109
# 4:       2  (30,60]    2753

#If you want separate columns, it is a simple as using `dcast`
library(reshape2)

dcast(dt2,subject~trialset, value.var = 'latency')
#   subject (0,30] (30,60]
# 1       1   4284    3443
# 2       2   1109    2753