是否有更快的方式来运行嵌套在两个for循环中的sapply?

时间:2017-01-27 11:47:34

标签: r for-loop sapply

我有一个带>的大数据框100万行代表几个人的时间序列数据(不同列中的不同个人数据)。 另外,我有一个3D数组,其中包含遇到的帧数,表示我想要提取数据的时间序列中的哪一帧。

对于给定的个人和遭遇类型,我想提取一个例如100帧。但是,由于每个会议类型和每个人都有很多重复,我想直接计算每个人的平均时间序列和遭遇类型。

我设法使用两个for循环中的sapply嵌入。但是,运行这些for循环非常慢,我现在想知道是否有更快的方法在R中实现此计算,或者我是否应该在C ++中执行此操作。在我的代码下面,以及我的一小部分数据:

nb_ind = 3;
response_duration = 100;
nb_meeting_types = 2;
nb_variables = 2;
speed_offset = 2;
MEETING_START_OFFSET = 50;
replicate = 20;

# behavior_data is a data frame with columns: frame,speed1,head1,speed2,head2,speed3,head3
# there are about 1 million rows
dim(behavior_data)
[1] 1080000  7

head(behavior_data)
  frame speed1 head1 speed2 headd2 speed3 head3 
1  0      0     25    2.4   179     1.1   16
2  1      1.5   20    2.0   -175    1.6   27
3  2      1.6   28    2.0   -178    1.0   37
4  3      0.8   56    1.6   170     0.8   37
5  4      0.3   56    1.8   162     0     40

# encounters is an array with frame numbers of dimension [nb_ind,replicate,nb_meeting_types]
# these frame number correspond to starting points of meetings, for which I want to calculate the speed    
dim(encounters)
[1] 3 20  2

head(encounters[,,1])
    [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]  [,11] [,12]  [,13]  [,14] [,15]  [,16] [,17]  [,18]  [,19] [,20]
[1,] 12049 17693 23350 29018 34666 40327 68608 57293 74264 45980 113864 79922 119522 102552 51636 153462 91235 142151 159121 62948
[2,] 12036 17694 23352 29014 34674 40322 68606 57296 74268 45982 113865 79929 119521 102558 51639 153463 91242 142161 159168 62952
[3,] 12037 17694 23351 29011 34669 40329 68606 57298 74263 45985     NA 79921     NA 102550 51641     NA 91234     NA     NA 62950


all_average_speeds = array(NaN, c(nb_ind, response_duration, nb_meeting_types))
for (j in 1:nb_ind){

  #calculate the average speed response for each meeting type for a given individual
  average_speed = numeric(0);
  for (i in 1:nb_meeting_types){

    # calculate the average speed response across all replicates of a given meeting type for a given individual
average_speed_type = sapply(1:response_duration, function(k){
      mean(behavior_data[,(j-1)*nb_variables + speed_offset][which(behavior_data$frame == ((encounters[j,,i] + k-1) - MEETING_START_OFFSET)], na.rm=TRUE)
    })
    average_speed = rbind(average_speed, t(average_speed_type))
  }
  all_average_speeds[j,,] = average_speed; 
}

0 个答案:

没有答案