请帮助我在python pandas中编写以下概念,我有以下数据类型:
id=["Train A","Train A","Train A","Train B","Train B","Train B"]
start = ["A","B","C","D","E","F"]
end = ["G","H","I","J","K","L"]
arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"]
departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:25:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
capacity = ["2","2","3","3","2","3"]
获取以下数据:
id arrival_time departure_time start end capacity
Train A 0 2016-05-19 08:25:00 A G 2
Train A 2016-05-19 13:50:00 2016-05-19 16:00:00 B H 2
Train A 2016-05-19 21:25:00 2016-05-20 07:25:00 C I 3
Train B 0 2016-05-24 12:50:00 D J 3
Train B 2016-05-24 18:30:00 2016-05-25 20:00:00 E K 2
Train B 2016-05-26 12:15:00 2016-05-26 19:45:00 F L 3
我想添加一个名为source and sink的列,如果到达和离开之间的时差小于3小时,则源是行程的开始,而接收器仅在行程中断时(即time_difference)超过3个小时,
time difference source sink
- A H
02:10:00 A H
10:00:00 C I
- D K
01:30:00 D K
19:30:00 F L
答案 0 :(得分:2)
df = df.assign(timediff=(df.departure_time - df.arrival_time))
df = df.assign(source = np.where(df.timediff.dt.seconds / 3600 < 3, df.shift(1).start, df.start))
df = df.assign(sink = np.where(df.timediff.dt.seconds.shift(1) / 3600 > 3, df.shift(-1).end, df.end))
print(df)
输出:
id arrival_time departure_time start end capacity sink \
0 Train A NaT 2016-05-19 08:25:00 A G 2 G
1 Train A 2016-05-19 13:50:00 2016-05-19 16:00:00 B H 2 H
2 Train A 2016-05-19 21:25:00 2016-05-20 07:25:00 C I 3 I
3 Train B NaT 2016-05-24 12:50:00 D J 3 K
4 Train B 2016-05-24 18:30:00 2016-05-25 20:00:00 E K 2 K
5 Train B 2016-05-26 12:15:00 2016-05-26 19:45:00 F L 3 L
timediff source
0 NaT A
1 0 days 02:10:00 A
2 0 days 10:00:00 C
3 NaT D
4 1 days 01:30:00 D
5 0 days 07:30:00 F