切片大熊猫' MultiIndex DataFrame

时间:2016-11-14 17:31:40

标签: pandas dataframe slice multi-index

要在参数运行中跟踪所有模拟结果,我在pandas中创建一个名为dfParRun的MultIndex DataFrame,如下所示:

import pandas as pd
import numpy as np
import itertools
limOpt = [0.1,1,10]
reimbOpt = ['Cash','Time']
xOpt = [0.1, .02, .03, .04, .05, .06, .07, .08]
zOpt = [1,5n10]
arrays = [limOpt, reimbOpt, xOpt, zOpt]
parameters = list(itertools.product(*arrays))
nPar = len(parameters)

variables = ['X', 'Y', 'Z']
nVar = len(variables)
index = pd.MultiIndex.from_tuples(parameters, names=['lim', 'reimb', 'xMax', 'zMax'])

dfParRun = pd.DataFrame(np.random.rand((nPar, nVar)), index=index, columns=variables)

为了分析我的参数运行,我想切片这个数据帧,但这似乎是一个负担。例如,我希望xMax的所有结果都高于0.5,lim等于10.此时,我找到的唯一工作方法是:

df = dfParRun.reset_index()
df.loc[(df.xMax>0.5) & (df.lim==10)]

我想知道是否有一种方法没有重置DataFrame的索引?

1 个答案:

答案 0 :(得分:2)

选项1
使用pd.IndexSlice
警告:需要sort_index

dfParRun.sort_index().loc[pd.IndexSlice[10, :, .0500001:, :]]

enter image description here

选项2
df

之后使用您的reset_index
df.query('xMax > 0.05 & lim == 10')

enter image description here

设置

import pandas as pd
import numpy as np
import itertools
limOpt = [0.1,1,10]
reimbOpt = ['Cash','Time']
xOpt = [0.1, .02, .03, .04, .05, .06, .07, .08]
zOpt = [1, 5, 10]
arrays = [limOpt, reimbOpt, xOpt, zOpt]
parameters = list(itertools.product(*arrays))
nPar = len(parameters)

variables = ['X', 'Y', 'Z']
nVar = len(variables)
index = pd.MultiIndex.from_tuples(parameters, names=['lim', 'reimb', 'xMax', 'zMax'])

dfParRun = pd.DataFrame(np.random.rand(*(nPar, nVar)), index=index, columns=variables)
df = dfParRun.reset_index()