我有一个像这样加载的DataFrame
minData = pd.read_csv(
currentSymbol["fullpath"],
header = None,
names = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends'],
parse_dates = [["Date", "Time"]],
date_parser = lambda x : datetime.datetime.strptime(x, '%Y%m%d %H%M'),
index_col = "Date_Time",
sep=' ')
数据看起来像这样
>>> minData.index
<class 'pandas.tseries.index.DatetimeIndex'>
[1998-01-02 09:30:00, ..., 2013-12-09 16:00:00]
Length: 1373036, Freq: None, Timezone: None
>>>
>>> minData.head(5)
Open High Low Close Volume \
Date_Time
1998-01-02 09:30:00 8.70630 8.70630 8.70630 8.70630 420.73
1998-01-02 09:35:00 8.82514 8.82514 8.82514 8.82514 420.73
1998-01-02 09:42:00 8.79424 8.79424 8.79424 8.79424 420.73
1998-01-02 09:43:00 8.76572 8.76572 8.76572 8.76572 1262.19
1998-01-02 09:44:00 8.76572 8.76572 8.76572 8.76572 420.73
Split Factor Earnings Dividends Active
Date_Time
1998-01-02 09:30:00 4 0 0 NaN
1998-01-02 09:35:00 4 0 0 NaN
1998-01-02 09:42:00 4 0 0 NaN
1998-01-02 09:43:00 4 0 0 NaN
1998-01-02 09:44:00 4 0 0 NaN
[5 rows x 9 columns]
我可以像我这样
从我的DataFrame中选择行>>> minData["2004-12-20"]
Open High Low Close Volume \
Date_Time
2004-12-20 09:30:00 35.8574 35.9373 35.8025 35.9273 154112.00
2004-12-20 09:31:00 35.8924 35.9174 35.8824 35.8874 17021.50
2004-12-20 09:32:00 35.8874 35.8924 35.8824 35.8824 17079.50
2004-12-20 09:33:00 35.8874 35.9423 35.8724 35.9373 32491.50
2004-12-20 09:34:00 35.9373 36.0023 35.9174 36.0023 40096.40
2004-12-20 09:35:00 35.9923 36.2071 35.9923 36.1471 67088.90
...
我的日期看起来像这样(从不同的文件中读取)
>>> ts
Timestamp('2004-12-20 00:00:00', tz=None)
>>>
我想设置&#39; Active&#39;在这一天所有分钟的col为True。
我可以用这个
来做到这一点minData.loc['2004-12-20',"Active"] = True
我可以使用这段疯狂的代码用我的TimeStamp日期做同样的事情
minData.loc[str(ts.year) + "-" + str(ts.month) + "-" + str(ts.day),"Active"] = True
是的,那就是从TimeStamp对象创建一个字符串!
我知道必须有更好的方法来做到这一点。
答案 0 :(得分:5)
我会这样做
In [20]: df = DataFrame(np.random.randn(10,1),index=date_range('20130101 23:55:00',periods=10,freq='T'))
In [21]: df['Active'] = False
In [22]: df
Out[22]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 False
2013-01-02 00:00:00 0.258194 False
2013-01-02 00:01:00 -1.765781 False
2013-01-02 00:02:00 0.106163 False
2013-01-02 00:03:00 -1.169214 False
2013-01-02 00:04:00 0.224484 False
[10 rows x 2 columns]
In [28]: df['Active'] = False
正如@Andy Hayden所指出的那样,normalize
将时间设置为0,这样您就可以直接与时间戳进行比较,并且时间为0。
In [34]: df.loc[df.index.normalize() == Timestamp('20130102'),'Active'] = True
In [35]: df
Out[35]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 False
2013-01-02 00:00:00 0.258194 True
2013-01-02 00:01:00 -1.765781 True
2013-01-02 00:02:00 0.106163 True
2013-01-02 00:03:00 -1.169214 True
2013-01-02 00:04:00 0.224484 True
[10 rows x 2 columns]
对于非常精细的控制,请执行此操作(如果您只想次作为索引器,则可以使用indexer_at_time
)。并且您始终可以使用和子句来执行更复杂的索引。
In [29]: df.loc[df.index.indexer_between_time('20130101 23:59:00','20130102 00:03:00'),'Active'] = True
In [30]: df
Out[30]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 True
2013-01-02 00:00:00 0.258194 True
2013-01-02 00:01:00 -1.765781 True
2013-01-02 00:02:00 0.106163 True
2013-01-02 00:03:00 -1.169214 True
2013-01-02 00:04:00 0.224484 False
[10 rows x 2 columns]