如何使用pandas使用if语句添加新列?

时间:2017-05-10 15:02:45

标签: python pandas

请帮助我在python pandas中编写以下概念,我有以下数据类型:

id=["Train A","Train A","Train A","Train B","Train B","Train B"]
start = ["A","B","C","D","E","F"]
end = ["G","H","I","J","K","L"]
arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"]
departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:25:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
capacity = ["2","2","3","3","2","3"]

获取以下数据:

id         arrival_time         departure_time         start  end  capacity

Train A          0                  2016-05-19 08:25:00   A     G    2
Train A   2016-05-19 13:50:00       2016-05-19 16:00:00   B     H    2
Train A   2016-05-19 21:25:00       2016-05-20 07:25:00   C     I    3
Train B          0                  2016-05-24 12:50:00   D     J    3
Train B   2016-05-24 18:30:00       2016-05-25 20:00:00   E     K    2
Train B   2016-05-26 12:15:00       2016-05-26 19:45:00   F     L    3

我想添加一个名为source and sink的列,如果到达和离开之间的时差小于3小时,则源是行程的开始,而接收器仅在行程中断时(即time_difference)超过3个小时,

time difference   source     sink
     -              A         H
     02:10:00       A         H
     10:00:00       C         I
     -              D         K
     01:30:00       D         K
     19:30:00       F         L

1 个答案:

答案 0 :(得分:2)

df = df.assign(timediff=(df.departure_time - df.arrival_time))

df = df.assign(source = np.where(df.timediff.dt.seconds / 3600 < 3, df.shift(1).start, df.start))

df = df.assign(sink = np.where(df.timediff.dt.seconds.shift(1) / 3600 > 3, df.shift(-1).end, df.end))

print(df)

输出:

        id        arrival_time      departure_time start end  capacity sink  \
0  Train A                 NaT 2016-05-19 08:25:00     A   G         2    G   
1  Train A 2016-05-19 13:50:00 2016-05-19 16:00:00     B   H         2    H   
2  Train A 2016-05-19 21:25:00 2016-05-20 07:25:00     C   I         3    I   
3  Train B                 NaT 2016-05-24 12:50:00     D   J         3    K   
4  Train B 2016-05-24 18:30:00 2016-05-25 20:00:00     E   K         2    K   
5  Train B 2016-05-26 12:15:00 2016-05-26 19:45:00     F   L         3    L   

         timediff source  
0             NaT      A  
1 0 days 02:10:00      A  
2 0 days 10:00:00      C  
3             NaT      D  
4 1 days 01:30:00      D  
5 0 days 07:30:00      F