我正在尝试对pandas层次索引数据帧进行sql样式的查询。 level(0)索引是'Exercise',level(1)是'Date'是datetime。如果我为level(o)索引指定一个值,我可以愉快地切片。
print gbed.loc ['Bench',pd.to_datetime('2011-01-03')]
但是如果我尝试使用冒号来表示级别(0)索引的“所有行”,则切片会失败并且级别上有一个keyerror(1)
# KeyError: 'the label [2011-01-03 00:00:00] is not in the [columns]'
print gbed.loc[:, pd.to_datetime('2011-01-03')]
Visualizing Pandas一书建议这种内部索引可能在某些情况下,但我无法弄清楚何时/为什么它不起作用。
Selection is even possible in some cases from an “inner” level: In [267]: data[:, 2] Out[267]: a 0.852965
来自'Python for Data Analysis'的第147页
我想知道在最后一种情况下我是否错误地指定了切片?示例代码如下。
import pandas as pd
#make an index with a handful of duplicate dates
dates1 = pd.date_range('1/1/2011', periods=8, freq='D')
dates2 = pd.date_range('1/1/2011', periods=4, freq='D')
dates = dates1.append(dates2)
ex = ['Squat','Squat','Squat','Squat','Squat','Squat','Squat','Squat','Bench','Bench','Bench','Bench',]
wt = [100,120,140,150,150,140,160,172,90,90,100,110]
cols = {'Exercise': ex, 'Weight': wt, 'Date': dates}
sf = pd.DataFrame(cols)
gbed = sf.groupby(['Exercise','Date']).max()
print gbed
#These two work: return rows for a specific exercise on 2011-01-03
# SELECT * WHERE Exercise = 'Bench' AND Date = 2011-01-03
print gbed.loc['Bench', pd.to_datetime('2011-01-03')]
print gbed.loc['Squat', pd.to_datetime('2011-01-03')]
#I am trying to return all rows that have a dated of '2011-01-03'
# SELECT * WHERE Date = 2011-01-03
# KeyError: 'the label [2011-01-03 00:00:00] is not in the [columns]'
print gbed.loc[:, pd.to_datetime('2011-01-03')]
答案 0 :(得分:1)
要按MultiIndex
选择使用DataFrame.xs
或slicers,这对于复杂的选择非常有用:
print (gbed.xs('2011-01-03', level=1, axis=0))
Weight
Exercise
Bench 100
Squat 140
print (gbed.xs('2011-01-03', level=1, axis=0, drop_level=False))
Weight
Exercise Date
Bench 2011-01-03 100
Squat 2011-01-03 140
idx = pd.IndexSlice
print (gbed.loc[idx[:, '2011-01-03'], :])
Weight
Exercise Date
Bench 2011-01-03 100
Squat 2011-01-03 140
idx = pd.IndexSlice
print (gbed.loc[idx['Bench', '2011-01-03'], :])
Weight
Exercise Date
Bench 2011-01-03 100