我们正在考虑服务器的延迟,该服务器只能同时照顾一个客户。假设我们有两个数据框:agg_data
和ind_data
。
> agg_data
minute service_minute
1 0 1
2 60 3
3 120 2
4 180 3
5 240 2
6 300 4
agg_data
为每小时的两个连续客户提供服务时间。例如,在60到120之间(从一开始的第二个小时),我们可以每3分钟为一位新客户提供服务,我们可以在给定时间内为20位客户提供服务。
ind_data
提供每位客户的到达时间:
Arrival
1 51
2 63
3 120
4 121
5 125
6 129
我需要为受service_minute
agg_data
影响的客户生成出发时间。
输出如下:
Arrival Dep
1 51 52
2 63 66
3 120 122
4 121 124
5 125 127
6 129 131
这是我当前的代码,这是正确但非常低效的:
ind_data$Dep = rep(0,now(ind_data))
# After the service time, the first customer can leave the system with no delay
# Service time is taken as that of the hour when the customer arrives
ind_data$Dep[1] = ind_data$Arrival[1] + agg_data[max(which(agg_data$minute<=ind_data$Arrival[1])),'service_minute']
# For customers after the first one,
# if they arrive when there is no delay (arrival time > departure time of the previous customer),
# then the service time is that of the hour when the arrive and
# departure time is arrival time + service time;
# if they arrive when there is delay (arrival time < departure time of the previous customer),
# then the service time is that of the hour when the previous customer leaves the system and
# the departure time is the departure time of the previous customer + service time.
for (i in 2:nrow(ind_data)){
ind_data$Dep[i] = max(
ind_data$Dep[i-1] + agg_data[max(which(agg_data$minute<=ind_data$Dep[i-1])),'service_minute'],
ind_data$Arrival[i] + agg_data[max(which(agg_data$minute<=ind_data$Arrival[i])),'service_minute']
)
}
我认为这是我们在agg_data
中搜索正确服务时间需要很长时间的步骤。有更高效的算法吗?
谢谢。
答案 0 :(得分:2)
这应该相当有效。这是一个非常简单的查找问题,具有明显的矢量化解决方案:
out <- data.frame(Arrival = ind_data$Arrival,
Dep = ind_data$Arrival + agg_data$service_minute[ # need an index to choose min
findInterval(ind_data$Arrival, agg_data$minute)]
)
> out
Arrival Dep
1 51 52
2 63 66
3 120 122
4 121 123
5 125 127
6 129 131
我相信我的代码比你的例子更多。我认为它有明显的错误。