从现有数据框创建平均时间段

时间:2014-03-20 17:33:09

标签: r dataframe average

我有以下名为'EasyScaled'的数据框;

str(EasyScaled)
'data.frame':   675045 obs. of  3 variables:
$ Trial           : chr  "1_easy.wav" "1_easy.wav" "1_easy.wav" "1_easy.wav" ...
$ TrialTime       : num  3000 3001 3002 3003 3004 ...
$ PupilBaseCorrect: num  0.784 0.781 0.78 0.778 0.777 ...

'TrialTime'数字变量表示每个数据点的时间(3000 = 3000ms,3001 = 3001 ms等),'PupilBaseCorrect'是我的因变量,'Trial'变量是指实验试验。

我想创建一个新对象,首先将我的数据划分为3个时间段(TimeBin1 = 3000-8000ms,TimeBin2 = 8001-13000ms,TimeBin3 = 13001 - 18000ms),然后计算每个timebin的平均值(对于每个试验)这样我最终会得到一些看起来像这样的东西(给出反映'PupilBaseCorrect'的值);

 Trial        TimeBin1     TimeBin2     TimeBin3
 1_easy       0.784        0.876        0.767 
 34_easy      0.781        0.872        0.765
 35_easy      0.78         0.871        0.762 
 ...etc       ...etc       ...etc       ....etc

我尝试过使用cut(),ddply()以及此博客http://lamages.blogspot.co.uk/2012/01/say-it-in-r-with-by-apply-and-friends.html上的一些建议,但未能找到正确的代码。我也尝试了这个;

EasyTimeBin <- aggregate(PupilBaseCorrect ~ Trial + TrialTime[3000:8000, 8001:1300,1301:1800], data=EasyScaled, mean)

但是得到了以下错误;

Error in TrialTime[3000:8000, 8001:1300, 1301:1800] : 
incorrect number of dimensions

非常感谢任何建议或意见。

1 个答案:

答案 0 :(得分:0)

切割和ddply的良好使用是正确的,但这里有一些香草R鸡肉刮,可以满足你的需要。

# Generate example data
EasyScaled <- data.frame(
  Trial = paste0(c(sapply(1:3, function(x) rep(x, 9))), "_easy.wav"),
  TrialTime = c(sapply(seq_len(9)-1, function(x) (floor(x/3))*5000 + x%%3 + 3000)),
  PupilBaseCorrect = rnorm(27, 0.78, 0.1)
)

# group means of PupilBaseCorrect by Trial + filename
tmp <- tapply(EasyScaled$PupilBaseCorrect,
    paste0(EasyScaled$Trial, ',',
           as.integer((EasyScaled$TrialTime - 3000)/5000)+1), mean)

# melt & recast the array manually into a dataframe
EasyTimeBin <- do.call(data.frame,
   append(list(row.names = NULL,
               Trial = gsub('.wav,.*','',names(tmp)[3*seq_len(length(tmp)/3)])), 
     structure(lapply(seq_len(3),
         function(x) tmp[3*(seq_len(length(tmp)/3)-1) + x]
       ), .Names = paste0("TimeBin", seq_len(3))
     )
   )
)

print(EasyTimeBin)
#  Trial   TimeBin1  TimeBin2  TimeBin3
# 1 1_easy 0.7471973 0.7850524 0.8939581
# 2 2_easy 0.8096973 0.8390587 0.7757359
# 3 3_easy 0.8151430 0.7855042 0.8081268