Question

我想构建一个功能，使使用动态多索引数据框过滤更加用户友好。

例如，该函数采用索引标签和过滤器值以及度量值元组的字典。

为此，该函数无需对索引标签的存在或顺序进行任何假设。我找到的最接近的东西是df.xs()。

示例代码：

df = pd.DataFrame({'lab1': np.random.choice(['A','B','C'],100,replace=True), 'lab2': np.random.choice(['one','two','three','four'],100,replace=True), 'val': np.random.rand(100)})
df = df.groupby(['lab1','lab2']).sum()

                 val
lab1 lab2           
A    four   3.296221
     one    5.057798
     three  3.443166
     two    3.913044
B    four   3.815448
     one    3.892152
     three  2.995777
     two    9.715343
C    four   6.118737
     one    3.735783
     three  2.461903
     two    5.252095

这是一个使用.xs（）的静态示例：

 df.xs(('A', slice('one','three')), level=['lab1','lab2'])
                 val
lab1 lab2           
A    one    5.057798
     three  3.443166

问题似乎是您不能轻易将列表参数传递到slice()中。我已经尝试使用pd.IndexSlice，map，lambda等，但是无法正常工作。

这是我想得到的ID：

filters = {
'lab1': 'A',
'lab2' : ('one','three'),
metrics = ('val')
}
def metric_ts(filters, metrics):
    levels = list(filters.keys()) + ['metric_name']
    keys = map(slice, list(filters.values()))
    return df_norm.xs(keys, levels)

注意：我知道可以使用.loc []等多种方法来执行此操作。我正在寻找一种不依赖位置语法的非常通用的解决方案。谢谢！

Answer 1

不确定如何使用xs，但可以使用DataFrame.query，前提是您可以动态构建查询字符串。

filters = {
'lab1': 'A',
'lab2' : ('one','three'),
}
metrics = 'val'

globals().update(filters) 

querystr = ' and '.join([
    f"{k} {'==' if isinstance(v, (str, np.number)) else 'in'} @{k}" 
    for k, v in filters.items()])

df.query(querystr)[metrics]  

lab1  lab2 
A     one      4.041335
      three    4.923771
Name: val, dtype: float64

可以看到类似的示例here。

Answer 2

我想出了如何使用.xs（）方法执行此操作。诀窍是在传递给函数之前，将多个标签包装在过滤器字典的slice()中。 IMO，我认为这比解析字典并使用.query（）更干净。

现在唯一的问题是slice()正在根据索引顺序返回连续切片（我希望它仅返回指定的值）。希望有人可以对此进行扩展。

df = pd.DataFrame({'lab1': np.random.choice(['A','B','C'],100,replace=True), 'lab2': np.random.choice(['one','two','three','four'],100,replace=True), 'val': np.random.rand(100)})
df = df.groupby(['lab1','lab2']).sum()

filters = {
'lab1': slice('A','C'),
'lab2' : slice('one','two')
}

def return_slice(filters):
    slices = pd.IndexSlice[tuple(filters.values())]
    levels = list(filters.keys())
    return df.xs(key=slices, level=levels,drop_level=False)

return_slice(filters)

                 val
lab1 lab2           
A    one    3.094135
     three  4.458957
     two    6.896360
B    one    2.917692
     three  6.754484
     two    4.023079
C    one    4.464885
     three  5.982234
     two    4.421695

如何将参数传递到df.xs（）

2 个答案: