尽管我能够解决该问题,但我想了解为什么会发生此错误。 DataFrame
import pandas as pd
import itertools
sl_df=pd.DataFrame(
data=list(range(18)),
index=pd.MultiIndex.from_tuples(
list(itertools.product(
['A','B','C'],
['I','II','III'],
['x','y']))),
columns=['one'])
出局:
one
A I x 0
y 1
II x 2
y 3
III x 4
y 5
B I x 6
y 7
II x 8
y 9
III x 10
y 11
C I x 12
y 13
II x 14
y 15
III x 16
y 17
可行的简单切片
sl_df.loc[pd.IndexSlice['A',:,'x']]
出局:
one
A I x 0
II x 2
III x 4
引发错误的部分:
sl_df.loc[pd.IndexSlice[:,'II']]
出局:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-6-4bfd2d65fd21> in <module>()
----> 1 sl_df.loc[pd.IndexSlice[:,'II']]
...\pandas\core\indexing.pyc in __getitem__(self, key)
1470 except (KeyError, IndexError):
1471 pass
-> 1472 return self._getitem_tuple(key)
1473 else:
1474 # we by definition only have the 0th axis
...\pandas\core\indexing.pyc in _getitem_tuple(self, tup)
868 def _getitem_tuple(self, tup):
869 try:
--> 870 return self._getitem_lowerdim(tup)
871 except IndexingError:
872 pass
...\pandas\core\indexing.pyc in _getitem_lowerdim(self, tup)
977 # we may have a nested tuples indexer here
978 if self._is_nested_tuple_indexer(tup):
--> 979 return self._getitem_nested_tuple(tup)
980
981 # we maybe be using a tuple to represent multiple dimensions here
...\pandas\core\indexing.pyc in _getitem_nested_tuple(self, tup)
1056
1057 current_ndim = obj.ndim
-> 1058 obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
1059 axis += 1
1060
...\pandas\core\indexing.pyc in _getitem_axis(self, key, axis)
1909
1910 # fall thru to straight lookup
-> 1911 self._validate_key(key, axis)
1912 return self._get_label(key, axis=axis)
1913
...\pandas\core\indexing.pyc in _validate_key(self, key, axis)
1796 raise
1797 except:
-> 1798 error()
1799
1800 def _is_scalar_access(self, key):
...\pandas\core\indexing.pyc in error()
1783 raise KeyError(u"the label [{key}] is not in the [{axis}]"
1784 .format(key=key,
-> 1785 axis=self.obj._get_axis_name(axis)))
1786
1787 try:
KeyError: u'the label [II] is not in the [columns]'
解决方法:(或在索引的第一级上有':'时的正确方法。)
sl_df.loc[pd.IndexSlice[:,'II'],:]
出局:
one
A II x 2
y 3
B II x 8
y 9
C II x 14
y 15
问题:为什么只有在MultiIndex的第一级上使用“:”时,才必须在轴1上指定“:”?您是否同意它可以在其他级别而不是在MultiIndex的第一个级别上工作有点古怪(请参见上面的简单切片)?
答案 0 :(得分:1)
从当前版本的pandas文档看来,使用切片器建立索引需要在.loc
方法中指定两个轴。
其基本原理是,如果不指定两个轴,则沿哪个轴进行选择可能会模棱两可。
我不清楚熊猫内部的工作原理,但是在您的特定情况下,当您编写sl_df.loc[pd.IndexSlice[:,'II']]
时,:
会分派到行轴(即选择所有行),而{ {1}}到列,因此错误:'II'
。
答案 1 :(得分:0)
因为多重索引位于这样的df [(A,I,x),(A,I,y)...(C,III,x),(C,III,y)]