按维度坐标列

时间:2017-10-27 19:03:07

标签: python python-xarray

我正在尝试将pandas面板切换为xarray.Dataset

我有一个从字典od数据帧创建的数据集。每个数据框包含一个股票的数据。数据帧行是交易日期,列是价格和指标。示例代码:

import pandas as pd
import xarray as xr

panel_dict = {}
panel_dict['AAPL'] = pd.DataFrame({'Open': [100, 105], 'Close': [104, 108],
                               'SMA200':[102, 110], 'RSI2': [11 , 14]},
                               index=['2017-09-01', '2017-09-02'])
panel_dict['AMZN'] = pd.DataFrame({'Open': [200, 180], 'Close': [190, 170],
                               'SMA200':[190, 190], 'RSI2': [11 , 15]},
                               index=['2017-09-01', '2017-09-02'])
panel_dict['AGN'] = pd.DataFrame({'Open': [300, 310], 'Close': [300, 310],
                               'SMA200':[250, 250], 'RSI2': [5 , 15]},
                               index=['2017-09-01', '2017-09-02'])

ds_full = xr.Dataset(panel_dict)

print(ds_full)

# selecting one day works
ds = ds_full.sel(dim_0 = '2017-09-02')
print(ds)

# filtering does not work
c = ds[ds['Close']>ds['SMA200']]
c = c[c['RSI2'] < 12.0 ]
c = c.sort_values(by = 'RSI2', ascending=True)

数据集ds_full如下所示:

<xarray.Dataset>
Dimensions:  (dim_0: 2, dim_1: 4)
Coordinates:
  * dim_0    (dim_0) object '2017-09-01' '2017-09-02'
  * dim_1    (dim_1) object 'Close' 'Open' 'RSI2' 'SMA200'
Data variables:
    AAPL     (dim_0, dim_1) int64 104 100 11 102 108 105 14 110
    AMZN     (dim_0, dim_1) int64 190 200 11 190 170 180 15 190
    AGN      (dim_0, dim_1) int64 300 300 5 250 310 310 15 250
<xarray.Dataset>

使用ds = ds_full.sel选择1天数据(dim_0 =&#39; 2017-09-02&#39;)效果很好:

<xarray.Dataset>
Dimensions:  (dim_1: 4)
Coordinates:
    dim_0    <U10 '2017-09-02'
    * dim_1    (dim_1) object 'Close' 'Open' 'RSI2' 'SMA200'
Data variables:
     AAPL     (dim_1) int64 108 105 14 110
     AMZN     (dim_1) int64 170 180 15 190
     AGN      (dim_1) int64 310 310 15 250

但是如何过滤一些其他条件,例如&#39;关闭&#39; &GT; &#39; SMA200&#39;或者&#39; RSI2&#39; &LT; 12?以及如何通过RSI2&#39;对结果进行排序列?

在使用pandas.panel的原始代码中,它是这样的:

c = ds[ds['Close']>ds['SMA200']]
c = c[c['RSI2'] < 12.0 ]
c = c.sort_values(by = 'RSI2', ascending=True)

0 个答案:

没有答案