请参阅通过for_stack.to_pickle('for_stack')
如下图所示,我需要添加一个新列,显示Timestamp(09:30)和'Gap Lower Closed First'之间的时间量,第1行为403分钟。
我需要在每天09:30的第一行执行此操作,如突出显示的那样。理想情况下,我想要一个新的数据框,如果可能的话,每天只显示09:30的条目?
感谢您的帮助。
我尝试使用以下(错误的代码)进行timedelta但只获得NaT
data['tvalue'] = data.index
data['delta'] = (data['Gap Lower Closed first'] - data['tvalue'])
'Gap Lower Closed first'
是dtype:datetime64[ns]
答案 0 :(得分:2)
您可以使用pandas.TimeGrouper
(我找不到相应的文档)和first
aggregate。
示例:
In [26]: df = pandas.DataFrame(index=pd.date_range('2016-01-01T09:30:00', periods=10, freq='30t') + pd.date_range('2016-01-02T09:30:00', periods=10, freq='30t'), data={'a': np.random.randn(20)})
manage.py:1: FutureWarning: using '+' to provide set union with datetimelike Indexes is deprecated, use .union()
#!/usr/bin/env python
In [27]: df
Out[27]:
a
2016-01-01 09:30:00 -0.693846
2016-01-01 10:00:00 1.627871
2016-01-01 10:30:00 -0.157882
2016-01-01 11:00:00 0.126959
2016-01-01 11:30:00 -0.865513
2016-01-01 12:00:00 0.042917
2016-01-01 12:30:00 -0.260965
2016-01-01 13:00:00 1.813741
2016-01-01 13:30:00 -1.108866
2016-01-01 14:00:00 1.030709
2016-01-02 09:30:00 -0.063701
2016-01-02 10:00:00 -0.695245
2016-01-02 10:30:00 -0.945378
2016-01-02 11:00:00 -0.394078
2016-01-02 11:30:00 2.005444
2016-01-02 12:00:00 0.920097
2016-01-02 12:30:00 0.329173
2016-01-02 13:00:00 1.951834
2016-01-02 13:30:00 -2.143820
2016-01-02 14:00:00 -0.357149
In [28]: df.groupby(pd.TimeGrouper(freq='1d')).first()
Out[28]:
a
2016-01-01 -0.693846
2016-01-02 -0.063701
在您的情况下,您可以
dfg = df.groupby(pd.TimeGrouper(freq='1d')).first()
dfg['delta'] = dfg['Gap Lower closed first'] - dfg.index
答案 1 :(得分:1)
我觉得这样做你想要的:
import pandas as pd
import numpy as np
import datetime
data = {'t1':[datetime.datetime(2014, 3, 10, 9, 30, 0),
datetime.datetime(2014, 3, 10, 8, 33, 0)],
't2':[datetime.datetime(2014, 3, 11, 10, 34, 0),
datetime.datetime(2014, 3, 10, 11, 41, 9)]
}
df = pd.DataFrame(data)
df = df.set_index('t1')
df['t_diff'] = df.t2 - df.index
In [15]: df
Out[15]:
t2 t_diff
t1
2014-03-10 09:33:00 2014-03-11 10:34:00 1 days 01:01:00
2014-03-10 08:33:00 2014-03-10 11:41:09 0 days 03:08:09
df930 = df[np.logical_and(df.index.hour == 9, df.index.minute == 30)]
In [24]: df930
Out[24]:
t2 t_diff
t1
2014-03-10 09:30:00 2014-03-11 10:34:00 1 days 01:04:00
我总是在多个条件下使用np.logical_and
,因为如果我只使用and
,它会比我通常所说的更宽泛地解释它,如下所示:ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()