我正在尝试在DataFrame中选择多级列。例如:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
输出:
first bar baz foo
second one two one two one two
first second
bar one 1.031494 -1.115284 -0.154907 0.044911 2.443488 -0.534575
two -0.236643 1.547236 2.132647 0.366896 -0.710489 -0.478956
baz one -0.365648 1.517573 0.668234 0.408448 -0.427475 -1.205160
two 1.362631 -0.785439 1.549837 -0.693337 0.610976 -1.989460
foo one -0.449393 0.195214 1.120589 0.413219 -0.820709 0.349553
two -1.128392 -0.590630 0.559310 -0.225504 1.721240 1.326330
我现在可以像这样选择0级=='bar':
df.loc[:,slice("bar")]
这给了我:
first bar
second one two
first second
bar one 1.031494 -1.115284
two -0.236643 1.547236
baz one -0.365648 1.517573
two 1.362631 -0.785439
foo one -0.449393 0.195214
two -1.128392 -0.590630
这:df.loc[:,slice(df.columns.levels[0][0])]
也有效并给出相同的结果。
我的问题:我可以得到上面的输出,但使用列'bar'的整数位置。所以而不是:
df.loc[:,slice("bar")]
我想用:
df.loc[:,slice(0)]
并获得完全相同的输出,即:
first bar
second one two
first second
bar one 1.031494 -1.115284
two -0.236643 1.547236
baz one -0.365648 1.517573
two 1.362631 -0.785439
foo one -0.449393 0.195214
two -1.128392 -0.590630
此外,如果我这样做:
df.loc[:,(slice(0), slice(0))]
我想得到:
first bar
second one
first second
bar one 1.031494
two -0.236643
baz one -0.365648
two 1.362631
foo one -0.449393
two -1.128392
鉴于我现在说“给我一个级别0 == 0(或”bar“)和级别1 == 0(或”一个“)的列”。我可以使用以下方法实现这一结果:
df.loc[:,(slice("bar"), slice("one"))]
但想改用整数。
答案 0 :(得分:1)
所以,你会发现这不满意,但我认为可能无法直接做你想做的事情。
简而言之,.iloc
behaves differenlty than .loc
for MultiIndexes。结果是,如果您想使用整数,那么您现在需要自己引用这些列。
以您的DataFrame为例:
first bar baz foo
second one two one two one two
first second
bar one -0.771 -0.211 -0.353 1.305 -0.595 1.174
two -1.777 -2.293 1.492 -2.638 0.197 0.406
baz one -0.413 -0.932 1.491 0.245 0.624 -0.506
two -1.656 -1.053 -0.946 -0.403 -0.416 0.604
foo one -0.586 0.030 0.517 0.899 -0.926 -0.777
two 1.477 -0.691 -1.330 1.022 -0.172 0.503
通过标签进行选择,您可以这样做(例如来自here):
df.loc[:, [('bar', 'one'),]]
# try also df.loc[:, [('bar', 'two'), ('baz', 'one')]]
first bar
second one
first second
bar one -0.771
two -1.777
baz one -0.413
two -1.656
foo one -0.586
two 1.477
现在,替换为.iloc
并保持相同的结构:
df.iloc[:, [(0, 0),]]
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
这里的差异被大熊猫开发者称为"deliberate design decision":
.iloc
是严格的位置索引器,不关注结构 根本,只有第一个实际行为。 .......loc
确实进入 帐户级别行为。 [强调补充]