使用句点选择/切片多索引数据帧时间序列会导致错误吗?

时间:2016-08-19 13:35:17

标签: python pandas time-series slice multi-index

我有一个多索引,第一级索引是一个时间序列,其方式与下一个相同:

In[168]: rng = pd.date_range('01-01-2000',periods=50,freq='M')

In[169]: long_df = pd.DataFrame(np.random.randn(50,4),index = rng, columns=['bar','baz','foo','zoo'])

In[170]: long_df = long_df.stack()

In[171]: long_df[:10]
Out[171]: 

2000-01-31  bar    2.079474
            baz   -0.569920
            foo    1.149012
            zoo   -0.228926
2000-02-29  bar    0.429502
            baz   -0.117166
            foo    0.956546
            zoo   -1.483818
2000-03-31  bar   -1.137998
            baz    1.049849

编辑

我可以使用句点切片,但效果很好:

In[172]: long_df = long_df.sort_index()

In[173]: long_df.loc['2001']
Out[173]: 
2001-01-31  bar   -0.193987
            baz    0.769297
            foo    0.286880
            zoo   -1.431313
2001-02-28  bar   -0.840502
            baz    1.786758
            foo    0.878356
            zoo    0.433383
2001-03-31  bar    0.897548
            baz    1.901540
            foo    0.110606
            zoo    0.571267
2001-04-30  bar   -0.375377
            baz    1.423742
            foo   -0.415006
            zoo   -0.141000
(...)

然而,当我使用multiindex版本时,我正在使用切片未被确认:

In[204]: dfmi
Out[204]: 
                      Last  Days to expiry
Date       Ticker                         
1988-12-06 HGF89   1.46894              52
           HGF90   1.17100             419
           HGG89   1.42100              80
           HGH89   1.37344             113
           HGH90   1.17450             477
           HGK89   1.28750             171
           HGK90   1.15900             539
           HGN89   1.24550             233
           HGN90   1.15900             598
           HGU89   1.21750             295
           HGU90   1.15900             659
           HGZ89   1.18500             386
1988-12-07 HGF89   1.51900              51
           HGF90   1.18900             418
           HGG89   1.46394              79
           HGH89   1.41300             112
           HGH90   1.19250             476
           HGK89   1.31750             170
           HGK90   1.17700             538
           HGN89   1.27550             232
           HGN90   1.17700             597
           HGU89   1.24250             294
           HGU90   1.17700             658
           HGZ89   1.20300             385
1988-12-08 HGF89   1.58100              50
           HGF90   1.18900             417
           HGG89   1.50894              78
           HGH89   1.43994             111
           HGH90   1.19250             475
           HGK89   1.32750             169
                   ...             ...
2016-07-05 HGK7    2.20500             325
           HGM7    2.20900             358
           HGN6    2.18150              22
           HGN7    2.21000             387
           HGQ6    2.18150              55
           HGQ7    2.21450             420
           HGU6    2.18350              85
           HGU7    2.21550             449
           HGV6    2.18700             114
           HGV7    2.21850             479
           HGX6    2.19100             146
           HGX7    2.22000             511
           HGZ6    2.19250             176
2016-07-06 HGF7    2.16700             205
           HGG7    2.17100             233
           HGH7    2.17100             266
           HGJ7    2.17550             294
           HGK7    2.17650             324
           HGM7    2.18050             357
           HGN6    2.15150              21
           HGN7    2.18150             386
           HGQ6    2.15150              54
           HGQ7    2.18600             419
           HGU6    2.15350              84
           HGU7    2.18700             448
           HGV6    2.15700             113
           HGV7    2.19000             478
           HGX6    2.16100             145
           HGX7    2.19150             510
           HGZ6    2.16300             175

[167701 rows x 2 columns]

In[204]: dfmi = dfmi.sort_index()

In[205]: dfmi.loc['2001']
Out[206]: 
                      Last  Days to expiry
Date       Ticker                         
1988-12-06 HGF89   1.46894              52
           HGF90   1.17100             419
           HGG89   1.42100              80
           HGH89   1.37344             113
           HGH90   1.17450             477
           HGK89   1.28750             171
           HGK90   1.15900             539
           HGN89   1.24550             233
           HGN90   1.15900             598
           HGU89   1.21750             295
           HGU90   1.15900             659
1988-12-07 HGF89   1.51900              51
           HGF90   1.18900             418
           HGG89   1.46394              79
           HGH89   1.41300             112
           HGH90   1.19250             476
           HGK89   1.31750             170
           HGK90   1.17700             538
           HGN89   1.27550             232
           HGN90   1.17700             597
           HGU89   1.24250             294
           HGU90   1.17700             658
1988-12-08 HGF89   1.58100              50
           HGF90   1.18900             417
           HGG89   1.50894              78
           HGH89   1.43994             111
           HGH90   1.19250             475
           HGK89   1.32750             169
           HGK90   1.17700             537
           HGN89   1.27750             231
                   ...             ...
2016-07-05 HGH7    2.19950             267
           HGJ7    2.20400             295
           HGK7    2.20500             325
           HGM7    2.20900             358
           HGN6    2.18150              22
           HGN7    2.21000             387
           HGQ6    2.18150              55
           HGQ7    2.21450             420
           HGU6    2.18350              85
           HGU7    2.21550             449
           HGV6    2.18700             114
           HGV7    2.21850             479
           HGX6    2.19100             146
           HGX7    2.22000             511
2016-07-06 HGF7    2.16700             205
           HGG7    2.17100             233
           HGH7    2.17100             266
           HGJ7    2.17550             294
           HGK7    2.17650             324
           HGM7    2.18050             357
           HGN6    2.15150              21
           HGN7    2.18150             386
           HGQ6    2.15150              54
           HGQ7    2.18600             419
           HGU6    2.15350              84
           HGU7    2.18700             448
           HGV6    2.15700             113
           HGV7    2.19000             478
           HGX6    2.16100             145
           HGX7    2.19150             510

[161017 rows x 2 columns]

我注意到我作为示例提供的long_df(pandas.core.series.Series)与我使用的df(pandas.core.frame.DataFrame)之间的类型存在差异

这样做的正确方法是什么?

非常感谢你的提示,

1 个答案:

答案 0 :(得分:3)

您需要添加loc,但需要pandas 0.18.1的最新版本:

print (long_df.loc['2001'])

2001-01-31  bar    1.684425
            baz    1.215258
            foo    0.158968
            zoo    0.689477
2001-02-28  bar   -0.123582
            baz    0.312533
            foo    0.609169
            zoo   -0.093985
2001-03-31  bar    0.372093
            baz   -0.281191
            foo   -0.400354
            zoo    0.646965
2001-04-30  bar   -0.287488
            baz   -0.928941
            foo    1.365416
            zoo    0.267282
2001-05-31  bar   -1.021086
            baz    0.317819
            foo   -0.393135
            zoo   -0.213589
2001-06-30  bar   -2.594173
...
...

编辑:

另一个解决方案是来自第一级get_level_values get_locconcat用于查找整数索引:

import pandas as pd

long_df = pd.read_csv('test/testslice.csv', parse_dates=[0], index_col=[0,1])
dfmi = long_df.stack().sort_index()

print (dfmi.index.get_level_values(0))
DatetimeIndex(['1988-12-06', '1988-12-06', '1988-12-06', '1988-12-06',
               '1988-12-06', '1988-12-06', '1988-12-06', '1988-12-06',
               '1988-12-06', '1988-12-06',
               ...
               '2016-07-06', '2016-07-06', '2016-07-06', '2016-07-06',
               '2016-07-06', '2016-07-06', '2016-07-06', '2016-07-06',
               '2016-07-06', '2016-07-06'],
              dtype='datetime64[ns]', name='Date', length=335402, freq=None)

print (dfmi.index.get_level_values(0).get_loc('2001'))
slice(121844, 133684, None)
print (dfmi.iloc[dfmi.index.get_level_values(0).get_loc('2001')])

Date        Ticker                
2001-01-02  HGF01   Last                0.8180
                    Days to expiry     27.0000
            HGF02   Last                0.8180
                    Days to expiry    392.0000
            HGG01   Last                0.8165
                    Days to expiry     55.0000
            HGG02   Last                0.8180
                    Days to expiry    420.0000
            HGH01   Last                0.8115
                    Days to expiry     85.0000
            HGH02   Last                0.8180
                    Days to expiry    448.0000
            HGJ01   Last                0.8125
                    Days to expiry    114.0000
            HGJ02   Last                0.8170
                    Days to expiry    479.0000
            HGK01   Last                0.8135
                    Days to expiry    147.0000
            HGK02   Last                0.8160
                    Days to expiry    512.0000
            HGM01   Last                0.8145
                    Days to expiry    176.0000
            HGM02   Last                0.8155
                    Days to expiry    540.0000
            HGN01   Last                0.8155
                    Days to expiry    206.0000
            HGN02   Last                0.8140
                    Days to expiry    573.0000
            HGQ01   Last                0.8160
                    Days to expiry    239.0000
                                        ...   
2001-12-31  HGK03   Last                0.6960
                    Days to expiry    513.0000
            HGM02   Last                0.6680
                    Days to expiry    177.0000
            HGM03   Last                0.6980
                    Days to expiry    542.0000
            HGN02   Last                0.6710
                    Days to expiry    210.0000
            HGN03   Last                0.7005
                    Days to expiry    575.0000
            HGQ02   Last                0.6740
                    Days to expiry    240.0000
            HGQ03   Last                0.7030
                    Days to expiry    604.0000
            HGU02   Last                0.6770
                    Days to expiry    269.0000
            HGU03   Last                0.7050
                    Days to expiry    634.0000
            HGV02   Last                0.6795
                    Days to expiry    302.0000
            HGV03   Last                0.7080
                    Days to expiry    667.0000
            HGX02   Last                0.6820
                    Days to expiry    329.0000
            HGX03   Last                0.7110
                    Days to expiry    694.0000
            HGZ02   Last                0.6850
                    Days to expiry    361.0000
            HGZ03   Last                0.7140
                    Days to expiry    728.0000
dtype: float64

EDIT1评论:

不幸的是,如果需要按范围选择,我只有列表推导的慢解决方案和https://sourceforge.net/projects/direct264

print (list(range(1993, 2003)))
[1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002]

dfs = [dfmi.iloc[dfmi.index.get_level_values(0).get_loc(str(x))] for x in range(1993, 2003)]
print (pd.concat(dfs))

1993-01-01 00:00:00  bar    0.080676
                     baz    0.315925
                     foo   -1.484132
                     zoo   -0.977202
1993-01-01 01:00:00  bar    0.817846
                     baz   -1.280649
                     foo    0.727975
                     zoo   -0.062142
1993-01-01 02:00:00  bar    1.278623
                     baz    0.268865
                     foo   -0.183612
                     zoo    0.194996
1993-01-01 03:00:00  bar   -0.304734
                     baz   -0.227468
                     foo   -0.134305
                     zoo    0.887374
1993-01-01 04:00:00  bar   -0.166669
                     baz   -0.132718
                     foo   -0.624932
                     zoo    1.959724
1993-01-01 05:00:00  bar   -1.379774
                     baz   -0.738452
                     foo    0.398924
                     zoo    0.005612
1993-01-01 06:00:00  bar   -0.864205
                     baz   -0.813321
                     foo    0.931858
                     zoo   -1.005977
1993-01-01 07:00:00  bar    0.667380
                     baz   -1.208457
                              ...   
2002-10-30 08:00:00  foo    0.311835
                     zoo    0.611802
2002-10-30 09:00:00  bar    2.615050
                     baz   -0.291767
                     foo   -0.508202
                     zoo    0.443429
2002-10-30 10:00:00  bar   -1.724252
                     baz   -0.126579
                     foo    1.108530
                     zoo   -0.553025
2002-10-30 11:00:00  bar    1.208705
                     baz   -1.561024
                     foo    0.722768
                     zoo    1.893419
2002-10-30 12:00:00  bar    0.239383
                     baz   -0.543053
                     foo   -0.687370
                     zoo    0.848929
2002-10-30 13:00:00  bar    0.897465
                     baz    0.631292
                     foo    0.068200
                     zoo   -1.579010
2002-10-30 14:00:00  bar   -0.996531
                     baz   -1.208318
                     foo    0.174970
                     zoo   -0.780913
2002-10-30 15:00:00  bar    0.237465
                     baz    0.380585
                     foo   -1.646285
                     zoo   -0.730744
dtype: float64