我有以下名为'EasyScaled'的数据框;
str(EasyScaled)
'data.frame': 675045 obs. of 3 variables:
$ Trial : chr "1_easy.wav" "1_easy.wav" "1_easy.wav" "1_easy.wav" ...
$ TrialTime : num 3000 3001 3002 3003 3004 ...
$ PupilBaseCorrect: num 0.784 0.781 0.78 0.778 0.777 ...
'TrialTime'数字变量表示每个数据点的时间(3000 = 3000ms,3001 = 3001 ms等),'PupilBaseCorrect'是我的因变量,'Trial'变量是指实验试验。
我想创建一个新对象,首先将我的数据划分为3个时间段(TimeBin1 = 3000-8000ms,TimeBin2 = 8001-13000ms,TimeBin3 = 13001 - 18000ms),然后计算每个timebin的平均值(对于每个试验)这样我最终会得到一些看起来像这样的东西(给出反映'PupilBaseCorrect'的值);
Trial TimeBin1 TimeBin2 TimeBin3
1_easy 0.784 0.876 0.767
34_easy 0.781 0.872 0.765
35_easy 0.78 0.871 0.762
...etc ...etc ...etc ....etc
我尝试过使用cut(),ddply()以及此博客http://lamages.blogspot.co.uk/2012/01/say-it-in-r-with-by-apply-and-friends.html上的一些建议,但未能找到正确的代码。我也尝试了这个;
EasyTimeBin <- aggregate(PupilBaseCorrect ~ Trial + TrialTime[3000:8000, 8001:1300,1301:1800], data=EasyScaled, mean)
但是得到了以下错误;
Error in TrialTime[3000:8000, 8001:1300, 1301:1800] :
incorrect number of dimensions
非常感谢任何建议或意见。
答案 0 :(得分:0)
切割和ddply的良好使用是正确的,但这里有一些香草R鸡肉刮,可以满足你的需要。
# Generate example data
EasyScaled <- data.frame(
Trial = paste0(c(sapply(1:3, function(x) rep(x, 9))), "_easy.wav"),
TrialTime = c(sapply(seq_len(9)-1, function(x) (floor(x/3))*5000 + x%%3 + 3000)),
PupilBaseCorrect = rnorm(27, 0.78, 0.1)
)
# group means of PupilBaseCorrect by Trial + filename
tmp <- tapply(EasyScaled$PupilBaseCorrect,
paste0(EasyScaled$Trial, ',',
as.integer((EasyScaled$TrialTime - 3000)/5000)+1), mean)
# melt & recast the array manually into a dataframe
EasyTimeBin <- do.call(data.frame,
append(list(row.names = NULL,
Trial = gsub('.wav,.*','',names(tmp)[3*seq_len(length(tmp)/3)])),
structure(lapply(seq_len(3),
function(x) tmp[3*(seq_len(length(tmp)/3)-1) + x]
), .Names = paste0("TimeBin", seq_len(3))
)
)
)
print(EasyTimeBin)
# Trial TimeBin1 TimeBin2 TimeBin3
# 1 1_easy 0.7471973 0.7850524 0.8939581
# 2 2_easy 0.8096973 0.8390587 0.7757359
# 3 3_easy 0.8151430 0.7855042 0.8081268