我有一个带>的大数据框100万行代表几个人的时间序列数据(不同列中的不同个人数据)。 另外,我有一个3D数组,其中包含遇到的帧数,表示我想要提取数据的时间序列中的哪一帧。
对于给定的个人和遭遇类型,我想提取一个例如100帧。但是,由于每个会议类型和每个人都有很多重复,我想直接计算每个人的平均时间序列和遭遇类型。
我设法使用两个for循环中的sapply嵌入。但是,运行这些for循环非常慢,我现在想知道是否有更快的方法在R中实现此计算,或者我是否应该在C ++中执行此操作。在我的代码下面,以及我的一小部分数据:
nb_ind = 3;
response_duration = 100;
nb_meeting_types = 2;
nb_variables = 2;
speed_offset = 2;
MEETING_START_OFFSET = 50;
replicate = 20;
# behavior_data is a data frame with columns: frame,speed1,head1,speed2,head2,speed3,head3
# there are about 1 million rows
dim(behavior_data)
[1] 1080000 7
head(behavior_data)
frame speed1 head1 speed2 headd2 speed3 head3
1 0 0 25 2.4 179 1.1 16
2 1 1.5 20 2.0 -175 1.6 27
3 2 1.6 28 2.0 -178 1.0 37
4 3 0.8 56 1.6 170 0.8 37
5 4 0.3 56 1.8 162 0 40
# encounters is an array with frame numbers of dimension [nb_ind,replicate,nb_meeting_types]
# these frame number correspond to starting points of meetings, for which I want to calculate the speed
dim(encounters)
[1] 3 20 2
head(encounters[,,1])
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 12049 17693 23350 29018 34666 40327 68608 57293 74264 45980 113864 79922 119522 102552 51636 153462 91235 142151 159121 62948
[2,] 12036 17694 23352 29014 34674 40322 68606 57296 74268 45982 113865 79929 119521 102558 51639 153463 91242 142161 159168 62952
[3,] 12037 17694 23351 29011 34669 40329 68606 57298 74263 45985 NA 79921 NA 102550 51641 NA 91234 NA NA 62950
all_average_speeds = array(NaN, c(nb_ind, response_duration, nb_meeting_types))
for (j in 1:nb_ind){
#calculate the average speed response for each meeting type for a given individual
average_speed = numeric(0);
for (i in 1:nb_meeting_types){
# calculate the average speed response across all replicates of a given meeting type for a given individual
average_speed_type = sapply(1:response_duration, function(k){
mean(behavior_data[,(j-1)*nb_variables + speed_offset][which(behavior_data$frame == ((encounters[j,,i] + k-1) - MEETING_START_OFFSET)], na.rm=TRUE)
})
average_speed = rbind(average_speed, t(average_speed_type))
}
all_average_speeds[j,,] = average_speed;
}