Question

我们正在考虑服务器的延迟，该服务器只能同时照顾一个客户。假设我们有两个数据框：agg_data和ind_data。

> agg_data
  minute service_minute
1      0    1
2     60    3
3    120    2
4    180    3
5    240    2
6    300    4

agg_data为每小时的两个连续客户提供服务时间。例如，在60到120之间（从一开始的第二个小时），我们可以每3分钟为一位新客户提供服务，我们可以在给定时间内为20位客户提供服务。

ind_data提供每位客户的到达时间：

         Arrival
1             51
2             63
3            120
4            121
5            125
6            129

我需要为受service_minute agg_data影响的客户生成出发时间。

输出如下：

         Arrival              Dep
1             51               52
2             63               66
3            120              122
4            121              124
5            125              127
6            129              131

这是我当前的代码，这是正确但非常低效的：

ind_data$Dep = rep(0,now(ind_data))
# After the service time, the first customer can leave the system with no delay
# Service time is taken as that of the hour when the customer arrives
ind_data$Dep[1] = ind_data$Arrival[1] + agg_data[max(which(agg_data$minute<=ind_data$Arrival[1])),'service_minute']

# For customers after the first one, 
# if they arrive when there is no delay (arrival time > departure time of the previous customer), 
# then the service time is that of the hour when the arrive and 
# departure time is arrival time + service time; 
# if they arrive when there is delay (arrival time < departure time of the previous customer), 
# then the service time is that of the hour when the previous customer leaves the system and 
# the departure time is the departure time of the previous customer + service time.

for (i in 2:nrow(ind_data)){
ind_data$Dep[i] = max(
ind_data$Dep[i-1] + agg_data[max(which(agg_data$minute<=ind_data$Dep[i-1])),'service_minute'],
ind_data$Arrival[i] + agg_data[max(which(agg_data$minute<=ind_data$Arrival[i])),'service_minute']
                )
}

我认为这是我们在agg_data中搜索正确服务时间需要很长时间的步骤。有更高效的算法吗？

谢谢。

Answer 1

这应该相当有效。这是一个非常简单的查找问题，具有明显的矢量化解决方案：

out <- data.frame(Arrival = ind_data$Arrival,
         Dep = ind_data$Arrival + agg_data$service_minute[ # need an index to choose min
                              findInterval(ind_data$Arrival, agg_data$minute)] 
 )

> out
  Arrival Dep
1      51  52
2      63  66
3     120 122
4     121 123
5     125 127
6     129 131

我相信我的代码比你的例子更多。我认为它有明显的错误。

一种有效的方法来查找数据帧的行号，条件不一致

1 个答案: