按日期字符串选择DataFrame切片

时间:2014-03-27 21:06:27

标签: python pandas

我有一个像这样加载的DataFrame

        minData = pd.read_csv(
                currentSymbol["fullpath"],
                header = None,
                names = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends'], 
                parse_dates = [["Date", "Time"]],
                date_parser = lambda x : datetime.datetime.strptime(x, '%Y%m%d %H%M'), 
                index_col = "Date_Time",
                sep=' ')

数据看起来像这样

>>> minData.index
<class 'pandas.tseries.index.DatetimeIndex'>
[1998-01-02 09:30:00, ..., 2013-12-09 16:00:00]
Length: 1373036, Freq: None, Timezone: None
>>> 

>>> minData.head(5)
                        Open     High      Low    Close   Volume  \
Date_Time                                                          
1998-01-02 09:30:00  8.70630  8.70630  8.70630  8.70630   420.73   
1998-01-02 09:35:00  8.82514  8.82514  8.82514  8.82514   420.73   
1998-01-02 09:42:00  8.79424  8.79424  8.79424  8.79424   420.73   
1998-01-02 09:43:00  8.76572  8.76572  8.76572  8.76572  1262.19   
1998-01-02 09:44:00  8.76572  8.76572  8.76572  8.76572   420.73   

                     Split Factor  Earnings  Dividends  Active  
Date_Time                                                       
1998-01-02 09:30:00             4         0          0     NaN  
1998-01-02 09:35:00             4         0          0     NaN  
1998-01-02 09:42:00             4         0          0     NaN  
1998-01-02 09:43:00             4         0          0     NaN  
1998-01-02 09:44:00             4         0          0     NaN  

[5 rows x 9 columns]

我可以像我这样

从我的DataFrame中选择行
>>> minData["2004-12-20"]
                        Open     High      Low    Close     Volume  \
Date_Time                                                            
2004-12-20 09:30:00  35.8574  35.9373  35.8025  35.9273  154112.00   
2004-12-20 09:31:00  35.8924  35.9174  35.8824  35.8874   17021.50   
2004-12-20 09:32:00  35.8874  35.8924  35.8824  35.8824   17079.50   
2004-12-20 09:33:00  35.8874  35.9423  35.8724  35.9373   32491.50   
2004-12-20 09:34:00  35.9373  36.0023  35.9174  36.0023   40096.40   
2004-12-20 09:35:00  35.9923  36.2071  35.9923  36.1471   67088.90   
...

我的日期看起来像这样(从不同的文件中读取)

>>> ts
Timestamp('2004-12-20 00:00:00', tz=None)
>>> 

我想设置&#39; Active&#39;在这一天所有分钟的col为True。

我可以用这个

来做到这一点
minData.loc['2004-12-20',"Active"] = True

我可以使用这段疯狂的代码用我的TimeStamp日期做同样的事情

minData.loc[str(ts.year) + "-" + str(ts.month) + "-" + str(ts.day),"Active"] = True

是的,那就是从TimeStamp对象创建一个字符串!

我知道必须有更好的方法来做到这一点。

1 个答案:

答案 0 :(得分:5)

我会这样做

In [20]: df = DataFrame(np.random.randn(10,1),index=date_range('20130101 23:55:00',periods=10,freq='T'))

In [21]: df['Active'] = False

In [22]: df
Out[22]: 
                            0 Active
2013-01-01 23:55:00  0.273194  False
2013-01-01 23:56:00  2.869795  False
2013-01-01 23:57:00  0.980566  False
2013-01-01 23:58:00  0.176711  False
2013-01-01 23:59:00 -0.354976  False
2013-01-02 00:00:00  0.258194  False
2013-01-02 00:01:00 -1.765781  False
2013-01-02 00:02:00  0.106163  False
2013-01-02 00:03:00 -1.169214  False
2013-01-02 00:04:00  0.224484  False

[10 rows x 2 columns]


In [28]: df['Active'] = False

正如@Andy Hayden所指出的那样,normalize将时间设置为0,这样您就可以直接与时间戳进行比较,并且时间为0。

In [34]: df.loc[df.index.normalize() == Timestamp('20130102'),'Active'] = True

In [35]: df
Out[35]: 
                            0 Active
2013-01-01 23:55:00  0.273194  False
2013-01-01 23:56:00  2.869795  False
2013-01-01 23:57:00  0.980566  False
2013-01-01 23:58:00  0.176711  False
2013-01-01 23:59:00 -0.354976  False
2013-01-02 00:00:00  0.258194   True
2013-01-02 00:01:00 -1.765781   True
2013-01-02 00:02:00  0.106163   True
2013-01-02 00:03:00 -1.169214   True
2013-01-02 00:04:00  0.224484   True

[10 rows x 2 columns]

对于非常精细的控制,请执行此操作(如果您只想作为索引器,则可以使用indexer_at_time)。并且您始终可以使用子句来执行更复杂的索引。

In [29]: df.loc[df.index.indexer_between_time('20130101 23:59:00','20130102 00:03:00'),'Active'] = True

In [30]: df
Out[30]: 
                            0 Active
2013-01-01 23:55:00  0.273194  False
2013-01-01 23:56:00  2.869795  False
2013-01-01 23:57:00  0.980566  False
2013-01-01 23:58:00  0.176711  False
2013-01-01 23:59:00 -0.354976   True
2013-01-02 00:00:00  0.258194   True
2013-01-02 00:01:00 -1.765781   True
2013-01-02 00:02:00  0.106163   True
2013-01-02 00:03:00 -1.169214   True
2013-01-02 00:04:00  0.224484  False

[10 rows x 2 columns]