Question

在Python中，使用pandas库我想将我的精确数据转换为每日数据。

在加载数据（来自csv）并将DatetimeIndex作为索引后，对象xx如下所示：

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 540949 entries, 2007-01-02 09:46:00+08:00 to 2013-10-17 16:15:00+08:00 
Data columns (total 5 columns):
Open      540949  non-null values
High      540949  non-null values
Low       540949  non-null values
Close     540949  non-null values
Volume    540949  non-null values
dtypes: int64(5)

我现在想将这个日内数据转换为每日OHLC数据。

我最初的尝试如下所示：

xx['date'] = [i.date() for i in xx.index]
xx['dailyOpen'] = xx.groupby('date').Open.transform(lambda s: s[0])
xx['dailyHigh'] = xx.groupby('date').High.transform(lambda s: s.max())
xx['dailyLow'] = xx.groupby('date').Low.transform(lambda s: s.min())
xx['dailyClose'] = xx.groupby('date').Close.transform(lambda s: s[len(s)-1])
dd  = xx.groupby('date').tail(1)[['dailyOpen','dailyHigh','dailyLow','dailyClose']]
dd.head()

是否有更有效/雄辩的方式来做到这一点？

注意

我刚刚找到了这个方法......它更整洁......但是有没有办法使用resample如果使用这种方法我可以将微小的OHLC时间序列转换为15分钟的时间序列OHLC吗？（N.B.可能会有一些缺失的分钟，所以每15行分裂就不会工作......）

def ohlcsum(df):
    df = df.sort()
    return {
       'Open': df['Open'][0],
       'High': df['High'].max(),
       'Low': df['Low'].min(),
       'Close': df['Close'][-1],
       'Volume': df['Volume'].sum()
      }

xx.groupby('date').agg(ohlcsum)

也不愿通过写这样的东西来制作日期专栏

xx['date'] = [i.date() for i in xx.index]

是否可以使用TimeGrouper('1D')功能按日期分组？

我尝试使用以下内容但由于某种原因它似乎不起作用......

xx.groupby(TimeGrouper('1D')).agg(ohlcsum)

非常感谢任何帮助......

Answer 1

在master / 0.13（很快发布）中，你可以这样做（在0.12中这是一个更多的手动操作，因为你必须在系列中单独完成）

In [7]: df = DataFrame(np.random.randn(10000,2),index=date_range('20130101 09:00:00',periods=10000,freq='1Min'),columns=['last','volume'])

In [8]: df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10000 entries, 2013-01-01 09:00:00 to 2013-01-08 07:39:00
Freq: T
Data columns (total 2 columns):
last      10000  non-null values
volume    10000  non-null values
dtypes: float64(2)
In [9]: df.resample('1D',how='ohlc')
Out[9]: 
                last                                  volume                              
                open      high       low     close      open      high       low     close
2013-01-01  0.801982  3.343166 -3.203291 -0.361502  0.255356  2.723863 -3.319414  1.073376
2013-01-02  0.101687  3.378843 -3.219792 -1.121900  1.226099  4.103099 -3.463014 -0.452594
2013-01-03 -0.051806  4.290010 -4.099700 -0.637321  0.713189  3.622728 -3.236652 -0.104458
2013-01-04  0.821215  3.058024 -3.907862 -1.595449  0.836234  2.821551 -3.191774 -0.399603
2013-01-05  0.084973  3.458210 -3.191455  1.426380 -0.402435  2.777447 -2.966165  1.227398
2013-01-06 -0.669922  3.232865 -3.902237  1.846017 -0.440055  3.088109 -3.710640  3.066725
2013-01-07 -0.122727  3.300163 -3.315501  1.718163  1.085066  3.373251 -4.029679  0.187828
2013-01-08  0.311785  3.073488 -3.013702 -0.627721 -0.502258  2.795292 -2.772738 -0.654676

[8 rows x 8 columns]

这将在0.12

中起作用

pd.concat(dict([ (k,df[k].resample('1D',how='ohlc')) for k in df.columns ]),axis=1)

Answer 2

我对pandas和python很新，但我想出了这个允许转换到任何时间段

在我的示例中，minData是分钟数据，以平面格式存储，没有任何逗号。我的数据来自quantquote.com

columnHeadings = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends']

minData = pd.read_csv(
    filename,
    header = None,
    names = columnHeadings, 
    parse_dates = [["Date", "Time"]],
    date_parser = lambda x: datetime.datetime.strptime(x, '%Y%m%d %H%M'), 
    index_col = "Date_Time",
    sep=' ')

xx = minData.to_period(freq="min")

openCol = DataFrame(xx.Open)
openCol2 = openCol.resample("M", how = 'first')

highCol = DataFrame(xx.High)
highCol2 = highCol.resample("M", how = 'max')

lowCol = DataFrame(xx.Low)
lowCol2 = lowCol.resample("M", how = 'min')

closeCol = DataFrame(xx.Close)
closeCol2 = closeCol.resample("M", how = 'last')

volumeCol = DataFrame(xx.Volume)
volumeCol2 = volumeCol.resample("M", how = 'sum')

#splitFactorCol = DataFrame(xx.SplitFactor)
#splitFactorCol.resample("M", how = 'first')


monthlyData = DataFrame(openCol2.Open)

monthlyData["High"] = highCol2.High
monthlyData["Low"] = lowCol2.Low
monthlyData["Close"] = closeCol2.Close
monthlyData["Volume"] = volumeCol2.Volume

我确信必须有一个更简洁的方法，但这适用于我拥有的数据，它允许我使用相同的代码生成15分钟，1小时，每天，每周和每月。它很快。

将非常感谢任何改进/评论。

亲切的问候，

-Jason

有没有更有效的方法来转换python中的日内OHLC DataFrame的周期性

2 个答案: