Pandas MultiIndex名称不起作用

时间:2013-10-26 22:38:19

标签: python pandas

axis 0中的IndexError让我感到奇怪。我的错误在哪里?

如果我在设置MultiIndex之前不重命名列(取消注释行df = df.set_index([0, 1])并注释上面的三个),它就有效。使用stable和dev版本进行测试。

我对python和pandas相当陌生,因此非常感谢任何其他改进建议。

import itertools
import datetime as dt

import numpy as np
import pandas as pd
from pandas.io.html import read_html


dfs = read_html('http://www.epexspot.com/en/market-data/auction/auction-table/2006-01-01/DE',
                attrs={'class': 'list hours responsive'},
                skiprows=1)

df = dfs[0]

hours = list(itertools.chain.from_iterable([[x, x] for x in range(1, 25)]))
df[0] = hours

df = df.rename(columns={0: 'a'})
df = df.rename(columns={1: 'b'})
df = df.set_index(['a', 'b'])
#df = df.set_index([0, 1])

today = dt.datetime(2006, 1, 1)
days = pd.date_range(today, periods=len(df.columns), freq='D')

colnames = [day.strftime(format='%Y-%m-%d') for day in days]
df.columns = colnames


Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/frame.py", line 2099, in __setattr__
    super(DataFrame, self).__setattr__(name, value)
  File "properties.pyx", line 59, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:29330)
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/generic.py", line 656, in _set_axis
    self._data.set_axis(axis, labels)
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 1039, in set_axis
    block.set_ref_items(self.items, maybe_rename=maybe_rename)
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 93, in set_ref_items
    self.items = ref_items.take(self.ref_locs)
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/index.py", line 395, in take
    taken = self.view(np.ndarray).take(indexer)
IndexError: index 7 is out of bounds for axis 0 with size 7

1 个答案:

答案 0 :(得分:1)

这是一个非常微妙的错误。在即将发布的版本0.13(很快)中将由https://github.com/pydata/pandas/pull/5345修复。

作为一种变通方法,您可以在set_index之后但在列分配

之前执行此操作
df = DataFrame(dict([ (c,col) for c, col in df.iteritems() ]))

框架的内部状态已关闭;它是重命名后跟set_index引起的,因此重新创建它以便你可以使用它。