使用标签列表从pandas DataFrame中选择观察的子集

时间:2016-02-29 21:52:13

标签: python pandas

鉴于DataFrame,di,

import pandas as pd
import numpy as np

data = {
    "Event": ['Biathlon', 'Ski Jump', 'Slalom', 'Downhill'],
    "Award": ['Gold', 'Bronze', 'Gold', 'Silver'],
    "Points":  ['100', '10', '100', '40']
}
d = pd.DataFrame(data)
di = d.set_index(["Award","Event"])

print(di)
                Points
Award  Event          
Gold   Biathlon    100
Bronze Ski Jump     10
Gold   Slalom      100
Silver Downhill     40

假设我想在“冬季两项”或“激流回旋”中选择所有具有“金牌”奖励的行......为什么会失败?

di.loc[('Gold',['Biathlon','Slalom']),:]

基于pandas documentation中的示例,它似乎应该可行。我已经从以下文档中复制了这个例子:

#example from http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers

def mklbl(prefix,n):
    return ["%s%s" % (prefix,i)  for i in range(n)]

miindex = pd.MultiIndex.from_product([mklbl('A',4),
                                     mklbl('B',2),
                                     mklbl('C',4),
                                     mklbl('D',2)])

micolumns = pd.MultiIndex.from_tuples([('a','foo'),('a','bar'),
                                                ('b','foo'),('b','bah')],
                                                names=['lvl0', 'lvl1'])

dfmi = pd.DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))),
index=miindex,
columns=micolumns).sort_index().sort_index(axis=1)

dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]

#this also works
dfmi.loc[(['A1','A3'],['B0','B1'], ['C1','C3']),:]

1 个答案:

答案 0 :(得分:2)

您需要先对索引进行排序:

In [15]:
data = {
    "Event": ['Biathlon', 'Ski Jump', 'Slalom', 'Downhill'],
    "Award": ['Gold', 'Bronze', 'Gold', 'Silver'],
    "Points":  ['100', '10', '100', '40']
}
d = pd.DataFrame(data)
di = d.set_index(["Award","Event"])
di = di.sort_index()
di

Out[15]:
                Points
Award  Event          
Bronze Ski Jump     10
Gold   Biathlon    100
       Slalom      100
Silver Downhill     40

In [16]:    
di.loc[('Gold',['Biathlon','Slalom']),:]

Out[16]:
               Points
Award Event          
Gold  Biathlon    100
      Slalom      100