Question

我正在尝试对pandas层次索引数据帧进行sql样式的查询。 level（0）索引是'Exercise'，level（1）是'Date'是datetime。如果我为level（o）索引指定一个值，我可以愉快地切片。

print gbed.loc ['Bench'，pd.to_datetime（'2011-01-03'）]

但是如果我尝试使用冒号来表示级别（0）索引的“所有行”，则切片会失败并且级别上有一个keyerror（1）

# KeyError: 'the label [2011-01-03 00:00:00] is not in the [columns]'
print gbed.loc[:, pd.to_datetime('2011-01-03')]

Visualizing Pandas一书建议这种内部索引可能在某些情况下，但我无法弄清楚何时/为什么它不起作用。

Selection is even possible in some cases from an “inner” level:
In [267]: data[:, 2]
Out[267]:
a 0.852965

来自'Python for Data Analysis'的第147页

我想知道在最后一种情况下我是否错误地指定了切片？示例代码如下。

import pandas as pd
#make an index with a handful of duplicate dates
dates1 = pd.date_range('1/1/2011', periods=8, freq='D')
dates2 = pd.date_range('1/1/2011', periods=4, freq='D')
dates = dates1.append(dates2)

ex = ['Squat','Squat','Squat','Squat','Squat','Squat','Squat','Squat','Bench','Bench','Bench','Bench',]
wt = [100,120,140,150,150,140,160,172,90,90,100,110]
cols = {'Exercise': ex, 'Weight': wt, 'Date': dates}

sf = pd.DataFrame(cols)

gbed = sf.groupby(['Exercise','Date']).max()
print gbed

#These two work: return rows for a specific exercise on 2011-01-03
# SELECT * WHERE Exercise = 'Bench' AND  Date = 2011-01-03
print gbed.loc['Bench', pd.to_datetime('2011-01-03')]
print gbed.loc['Squat', pd.to_datetime('2011-01-03')]

#I am trying to return all rows that have a dated of '2011-01-03'
# SELECT * WHERE Date = 2011-01-03
# KeyError: 'the label [2011-01-03 00:00:00] is not in the [columns]'
print gbed.loc[:, pd.to_datetime('2011-01-03')]

Answer 1

要按MultiIndex选择使用DataFrame.xs或slicers，这对于复杂的选择非常有用：

print (gbed.xs('2011-01-03', level=1, axis=0))
          Weight
Exercise        
Bench        100
Squat        140

print (gbed.xs('2011-01-03', level=1, axis=0, drop_level=False))
                     Weight
Exercise Date              
Bench    2011-01-03     100
Squat    2011-01-03     140

idx = pd.IndexSlice
print (gbed.loc[idx[:, '2011-01-03'], :])
                     Weight
Exercise Date              
Bench    2011-01-03     100
Squat    2011-01-03     140

idx = pd.IndexSlice
print (gbed.loc[idx['Bench', '2011-01-03'], :])
                     Weight
Exercise Date              
Bench    2011-01-03     100

切片pandas Hierarchical Index的内部索引会抛出一个keyerror

1 个答案: