Question

我有一个具有3个级别的MultiIndex的DataFrame：

id    foo  bar    col1
0     1    a -0.225873
      2    a -0.275865
      2    b -1.324766
3     1    a -0.607122
      2    a -1.465992
      2    b -1.582276
      3    b -0.718533
7     1    a -1.904252
      2    a  0.588496
      2    b -1.057599
      3    a  0.388754
      3    b -0.940285

保留id索引级别，我想沿foo和bar级别求和，但每个id的值不同。

例如，对于id = 0，我想对foo = [1]求和，而bar = [[“ a”，“ b”]]，对于id = 3我想对foo = [2]求和bar = [[“ a”，“ b”]]，对于id = 7，我想对foo = [[1,2，]]和bar = [[“ a”]]求和。给出结果：

id    col1
0     -0.225873    
3     -3.048268   
7     -1.315756

我一直在尝试以下方法：

df.loc(axis = 0)[[(0, 1, ["a","b"]), (3, 2, ["a","b"]), (7, [1,2], "a")].sum()

不确定这是否可能。任何优雅的解决方案（可能要删除MultiIndex？）将不胜感激！

Answer 1

元组列表不是问题。每个元组不对应单个索引的事实是一个问题（因为list不是有效的key）。如果要为这样的数据帧建立索引，则需要将每个元组内的列表扩展为它们自己的条目。

定义您的选项，例如以下词典列表，然后使用列表推导进行转换，并使用所有单个条目进行索引。

d = [
  {
    'id': 0,
    'foo': [1],
    'bar': ['a', 'b']
  },
  {
    'id': 3,
    'foo': [2],
    'bar': ['a', 'b']
  },
  {
    'id': 7,
    'foo': [1, 2],
    'bar': ['a']
  },
]

all_idx = [
    (el['id'], i, j)
    for el in d
    for i in el['foo']
    for j in el['bar']
]

# [(0, 1, 'a'), (0, 1, 'b'), (3, 2, 'a'), (3, 2, 'b'), (7, 1, 'a'), (7, 2, 'a')]

df.loc[all_idx].groupby(level=0).sum()

        col1
id
0  -0.225873
3  -3.048268
7  -1.315756

Answer 2

使用 slicers 的更简洁的解决方案：

 host, x-forwarded-host, accept, x-forwarded-server, x-forwarded-proto, x-forwarded-for, user-agent, accept-encoding, x-real-ip, x-forwarded-port, x-forwarded-prefix

sections = [(0, 1, slice(None)), (3, 2, slice(None)), (7, slice(1,2), "a")]
pd.concat(df.loc[s] for s in sections).groupby("id").sum()

注意两点：

由于 col1 id 0 -0.225873 3 -3.048268 7 -1.315756 创建了一个新的 DataFrame，因此这可能比接受的答案的内存效率低。
pd.concat 是必需的，否则调用 slice(None) 时 df.loc[s] 的索引列不匹配。

使用元组列表通过Pandas multiindex选择值

2 个答案: