每日计算两个Pandas DF行之间的时差

时间:2016-07-05 21:14:56

标签: python pandas

请参阅通过for_stack.to_pickle('for_stack')

保存的for_stack数据框数据

如下图所示,我需要添加一个新列,显示Timestamp(09:30)和'Gap Lower Closed First'之间的时间量,第1行为403分钟。

我需要在每天09:30的第一行执行此操作,如突出显示的那样。理想情况下,我想要一个新的数据框,如果可能的话,每天只显示09:30的条目?

感谢您的帮助。

for_stack image

我尝试使用以下(错误的代码)进行timedelta但只获得NaT

data['tvalue'] = data.index
data['delta'] = (data['Gap Lower Closed first'] - data['tvalue'])

'Gap Lower Closed first'是dtype:datetime64[ns]

2 个答案:

答案 0 :(得分:2)

您可以使用pandas.TimeGrouper(我找不到相应的文档)和first aggregate

示例:

In [26]: df = pandas.DataFrame(index=pd.date_range('2016-01-01T09:30:00', periods=10, freq='30t') + pd.date_range('2016-01-02T09:30:00', periods=10, freq='30t'), data={'a': np.random.randn(20)})
manage.py:1: FutureWarning: using '+' to provide set union with datetimelike Indexes is deprecated, use .union()
  #!/usr/bin/env python


In [27]: df
Out[27]: 
                            a
2016-01-01 09:30:00 -0.693846
2016-01-01 10:00:00  1.627871
2016-01-01 10:30:00 -0.157882
2016-01-01 11:00:00  0.126959
2016-01-01 11:30:00 -0.865513
2016-01-01 12:00:00  0.042917
2016-01-01 12:30:00 -0.260965
2016-01-01 13:00:00  1.813741
2016-01-01 13:30:00 -1.108866
2016-01-01 14:00:00  1.030709
2016-01-02 09:30:00 -0.063701
2016-01-02 10:00:00 -0.695245
2016-01-02 10:30:00 -0.945378
2016-01-02 11:00:00 -0.394078
2016-01-02 11:30:00  2.005444
2016-01-02 12:00:00  0.920097
2016-01-02 12:30:00  0.329173
2016-01-02 13:00:00  1.951834
2016-01-02 13:30:00 -2.143820
2016-01-02 14:00:00 -0.357149

In [28]: df.groupby(pd.TimeGrouper(freq='1d')).first()
Out[28]: 
                   a
2016-01-01 -0.693846
2016-01-02 -0.063701

在您的情况下,您可以

dfg = df.groupby(pd.TimeGrouper(freq='1d')).first()
dfg['delta'] = dfg['Gap Lower closed first'] - dfg.index

答案 1 :(得分:1)

我觉得这样做你想要的:

import pandas as pd
import numpy as np
import datetime

data = {'t1':[datetime.datetime(2014, 3, 10, 9, 30, 0), 
              datetime.datetime(2014, 3, 10, 8, 33, 0)], 
        't2':[datetime.datetime(2014, 3, 11, 10, 34, 0), 
              datetime.datetime(2014, 3, 10, 11, 41, 9)]
        }

df = pd.DataFrame(data)
df = df.set_index('t1')
df['t_diff'] = df.t2 - df.index

In [15]: df
Out[15]:
                                     t2          t_diff
t1
2014-03-10 09:33:00 2014-03-11 10:34:00 1 days 01:01:00
2014-03-10 08:33:00 2014-03-10 11:41:09 0 days 03:08:09


df930 = df[np.logical_and(df.index.hour == 9, df.index.minute == 30)]

In [24]: df930
Out[24]:
                                     t2          t_diff
t1
2014-03-10 09:30:00 2014-03-11 10:34:00 1 days 01:04:00

我总是在多个条件下使用np.logical_and,因为如果我只使用and,它会比我通常所说的更宽泛地解释它,如下所示:ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()