使用带有分层索引的数据,有没有办法轻松选择一系列值?我见过的所有方法,包括xs
和.loc
,似乎仅限于一个值,请参阅Benefits of panda's multiindex?。使用此示例数据
from pandas import *
from numpy import *
import itertools as it
M = 100 # Number of rows to generate
# Create some test data with multiindex
df = DataFrame(randn(M, 10))
df.index = [randint(4, size=M), randint(8, size=M)]
df.index.rename(['a', 'b'])
我希望能够选择第一个索引为1或2且第二个索引为3或4的所有内容。我最接近的是使用.loc
列表元组
# Now extract a subset
part = df.loc[[(1, 3), (1,4), (2,3), (2,4)]]
但这会产生一些奇怪的行为,
# The old indices are still shown for some reason
print(part.index.levels)
# Good indexing
print("correct:\n", part.loc[(1, 1)])
# No keyerror, although the key wasn't included
print("wrong:\n", part.loc[[(0, 3)]])
# Indexing of first index, and then a column, very odd
print("odd:\n", part.loc[(1, 9)])
# But there is an error accessing the original this way
print("Expected error:\n", df.loc[(1, 9)])
输出:
In [436]: [[0, 1, 2, 3], [0, 1, 2, 3, 4, 5, 6, 7]]
correct:
0 1 2 3 4 5 6 \
1 3 -0.183667 0.578867 -0.944514 0.026295 0.778354 0.603845 0.636486
3 -0.337596 0.018084 -0.654721 -1.121475 -0.561706 0.695095 -0.512936
3 -0.670779 -0.425093 1.262278 -1.806815 0.855900 -0.230683 -0.225658
3 -0.274808 -0.529901 1.265333 0.559646 -1.418687 0.492577 0.141648
7 8 9
1 3 1.109179 -1.569236 -0.617408
3 -0.659310 1.249105 0.032657
3 0.315601 1.100192 -0.389736
3 -0.267462 -0.025189 0.069047
odd:
3 -0.617408
3 0.032657
3 -0.389736
3 0.069047
4 0.217577
4 -0.232357
Name: 9, dtype: float64
wrong:
0 1 2 3 4 5 6 7 8 9
0 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
(truncated)
那么有没有比元组列表更好的方法来访问分层索引的多个部分?如果没有,有没有办法在使用元组索引后清理结果,以便给出合理的错误,而不是NaN?
答案 0 :(得分:0)
你可以使用pd.IndexSlice
来获得更多人类可读的切片
In [52]: idx = pd.IndexSlice
In [53]: dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']]
Out[53]:
lvl0 a b
lvl1 foo foo
A0 B0 C1 D0 8 10
D1 12 14
C3 D0 24 26
D1 28 30
B1 C1 D0 40 42
D1 44 46
C3 D0 56 58
... ... ...
A3 B0 C1 D1 204 206
C3 D0 216 218
D1 220 222
B1 C1 D0 232 234
D1 236 238
C3 D0 248 250
D1 252 254
[32 rows x 2 columns]
见http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers