使用python pandas计算时差并打印到csv

时间:2013-07-27 07:54:39

标签: python pandas

  completed             deadline
15-07-2013 23:10    15-07-2013 23:15
16-07-2013 00:20    16-07-2013 00:15
16-07-2013 00:20    16-07-2013 00:15
16-07-2013 21:04    16-07-2013 21:30
16-07-2013 21:58    16-07-2013 22:00
16-07-2013 23:21    16-07-2013 23:15
16-07-2013 23:21    16-07-2013 23:15
17-07-2013 00:19    17-07-2013 00:15
17-07-2013 00:19    17-07-2013 00:15
17-07-2013 21:18    17-07-2013 21:30
17-07-2013 22:07    17-07-2013 22:00

当我说data['completed'] - data['deadline']时,我得到了;

-1 day, 23:55:00 # on time
         0:05:00
         0:05:00
-1 day, 23:34:00 # on time
-1 day, 23:58:00 # on time
         0:06:00
         0:06:00
         0:04:00
         0:04:00
-1 day, 23:48:00 # on time
         0:07:00

但当我data['time_delay'] = data['completed'] - data['deadline']并打印data['time_delay']时,我得到了;

-300000000000
300000000000
300000000000
-1560000000000
-120000000000
360000000000
360000000000
240000000000
240000000000
-720000000000
420000000000

当输出打印到csv时,我得到相同的结果。

我如何:

  1. 处理此输出?

  2. 以'分钟'格式将输出打印到csv?

  3. 处理'准时'输出?

2 个答案:

答案 0 :(得分:2)

>>> data = pd.read_csv('1.csv', parse_dates=[0,1])
>>> data['time_delay'] = data['completed'] - data['deadline']
>>> print data['time_delay']
0   -00:05:00
1    00:05:00
2    00:05:00
3   -00:26:00
4   -00:02:00
Name: time_delay, dtype: timedelta64[ns]
>>> data.to_csv(sys.stdout)
,completed,deadline,time_delay
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-300000000000
1,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000
2,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-1560000000000
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-120000000000
>>> data['time_delay'] = data['time_delay'].apply(pd.lib.repr_timedelta64)
>>> data.to_csv(sys.stdout)
,completed,deadline,time_delay
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-00:05:00
1,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00
2,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-00:26:00
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-00:02:00

pandas.lib.repr_timedelta64没有记录。所以这段代码将来会破裂。 (我用过pandas 0.11.0)

答案 1 :(得分:1)

试试这个:

def func(x,y):
  if x > y: 
    return 'delayed by ' + str( ((x-y).seconds//60)%60) + ' minutes'
  else:
    return 'on time by ' + str( ((y-x).seconds//60)%60) + ' minutes'


  data["ontime"] = data.apply(lambda row: func(row["completed"], row["deadline"]), axis=1)

这给出了:

    completed                   deadline              ontime
0   2013-07-15 23:10:00    2013-07-15 23:15:00     on time by 5 minutes
1   2013-07-16 00:20:00    2013-07-16 00:15:00     delayed by 5 minutes
2   2013-07-16 00:20:00    2013-07-16 00:15:00     delayed by 5 minutes
3   2013-07-16 21:04:00    2013-07-16 21:30:00     on time by 26 minutes
4   2013-07-16 21:58:00    2013-07-16 22:00:00     on time by 2 minutes
5   2013-07-16 23:21:00    2013-07-16 23:15:00     delayed by 6 minutes
6   2013-07-16 23:21:00    2013-07-16 23:15:00     delayed by 6 minutes
7   2013-07-17 00:19:00    2013-07-17 00:15:00     delayed by 4 minutes
8   2013-07-17 00:19:00    2013-07-17 00:15:00     delayed by 4 minutes
9   2013-07-17 21:18:00    2013-07-17 21:30:00     on time by 12 minutes
10  2013-07-17 22:07:00    2013-07-17 22:00:00     delayed by 7 minutes