我有一个数据框,看起来像是一个多索引的简单用例:我以ISO周数和日期作为索引,我想按特定的星期进行过滤。按照the docs中的说明进行操作,看来我应该只需传递一个星期数字符串就可以建立索引。但是,这给我传递了一个密钥错误。
MCVE:
data = {'foo': {('2016_32', '2016-08-07'): 0.14285714285714285,
('2016_32', '2016-08-08'): 0.14285714285714285,
('2016_32', '2016-08-09'): 0.14285714285714285,
('2016_32', '2016-08-10'): 0.14285714285714285,
('2016_32', '2016-08-11'): 0.14285714285714285,
('2016_32', '2016-08-12'): 0.14285714285714285,
('2016_32', '2016-08-13'): 0.14285714285714285,
('2016_36', '2016-09-04'): 0.14285714285714285,
('2016_36', '2016-09-05'): 0.14285714285714285,
('2016_36', '2016-09-06'): 0.14285714285714285,
('2016_36', '2016-09-07'): 0.14285714285714285,
('2016_36', '2016-09-08'): 0.14285714285714285,
('2016_36', '2016-09-09'): 0.14285714285714285},
'bar': {('2016_32', '2016-08-07'): np.nan,
('2016_32', '2016-08-08'): np.nan,
('2016_32', '2016-08-09'): np.nan,
('2016_32', '2016-08-10'): np.nan,
('2016_32', '2016-08-11'): np.nan,
('2016_32', '2016-08-12'): np.nan,
('2016_32', '2016-08-13'): np.nan,
('2016_36', '2016-09-04'): 0.0,
('2016_36', '2016-09-05'): 0.0,
('2016_36', '2016-09-06'): 0.0,
('2016_36', '2016-09-07'): 0.0,
('2016_36', '2016-09-08'): 0.0,
('2016_36', '2016-09-09'): 0.0}}
df = pd.DataFrame(data)
df['2016_32']
KeyError: '2016_32'
答案 0 :(得分:4)
通常对于选择Multiindex
使用DataFrame.xs
:
#default first level should be omit
print (df.xs('2016_32'))
#select by second level
#print (df.xs('2016-09-07', level=1))
foo bar
2016-08-07 0.142857 NaN
2016-08-08 0.142857 NaN
2016-08-09 0.142857 NaN
2016-08-10 0.142857 NaN
2016-08-11 0.142857 NaN
2016-08-12 0.142857 NaN
2016-08-13 0.142857 NaN
或loc
:
#no parameter if select first level
print (df.loc['2016_32'])
#if want select second level axis=0 and : for select all values of first level
print (df.loc(axis=0)[:, '2016-09-07'])
MultiIndex在列和行中的选择差异:
np.random.seed(235)
a = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
a1 = pd.MultiIndex.from_product([['A', 'B', 'C'], ['E','F']])
df = pd.DataFrame(np.random.randint(10, size=(6, 8)), index=a1, columns=a)
print (df)
bar baz foo qux
one two one two one two one two
A E 8 1 5 8 3 5 3 3
F 3 1 3 6 6 1 0 2
B E 0 3 1 7 0 0 8 2
F 6 7 7 4 2 7 7 5
C E 7 3 1 7 3 9 7 3
F 8 2 0 8 5 2 2 0
#select by column bar level
print (df['bar'])
one two
A E 8 1
F 3 1
B E 0 3
F 6 7
C E 7 3
F 8 2
#select by column bar and then by `one`
print (df['bar']['one'])
A E 8
F 3
B E 0
F 6
C E 7
F 8
Name: one, dtype: int32
#select by tuples for columns select
print (df[('bar', 'one')])
A E 8
F 3
B E 0
F 6
C E 7
F 8
Name: (bar, one), dtype: int32
对于按行选择(索引中的MultiIndex),请使用loc
:
print (df.loc['A'])
bar baz foo qux
one two one two one two one two
E 8 1 5 8 3 5 3 3
F 3 1 3 6 6 1 0 2
print (df.loc['A'].loc['F'])
bar one 3
two 1
baz one 3
two 6
foo one 6
two 1
qux one 0
two 2
Name: F, dtype: int32
print (df.loc[('A', 'F')])
bar one 3
two 1
baz one 3
two 6
foo one 6
two 1
qux one 0
two 2
Name: (A, F), dtype: int32
答案 1 :(得分:1)
或者,您可以将交换级别与swaplevel一起使用,而无需更改顺序:
>>> df[:7].swaplevel(0, 0, axis=0)
foo bar
2016_32 2016-08-07 0.142857 NaN
2016-08-08 0.142857 NaN
2016-08-09 0.142857 NaN
2016-08-10 0.142857 NaN
2016-08-11 0.142857 NaN
2016-08-12 0.142857 NaN
2016-08-13 0.142857 NaN
或者简单地:
>>> df[1:7]
foo bar
2016_32 2016-08-08 0.142857 NaN
2016-08-09 0.142857 NaN
2016-08-10 0.142857 NaN
2016-08-11 0.142857 NaN
2016-08-12 0.142857 NaN
2016-08-13 0.142857 NaN