Question

我有以下类似的multiIndex：

df=pd.DataFrame({"A":[1,1,1,1, 1], 
                 "B":[2,2,2,2,2], 
                 "C":[2,2,2,2,2], 
                 "D":[1,1,1,1,1]}).T 

df1=pd.DataFrame({"A":[1,1,1,1,1,1,1,1], 
                 "B":[2,2,2,2,2,2,2,2,], 
                 "C":[1,1,1,1,1,1,1,1,], 
                 "D":[2,2,2,2,2,2,2,2,]}).T

df= pd.concat([df, df1], axis=1) 

col0 = pd.Series(['Set 1','Set 1','Set 1','Set 1','Set 1','Set 2' ,'Set 2' ,'Set 2' ,'Set 2' ,'Set 2' ,'Set 2' ,'Set 2' ,'Set 2'])
col1 = df.columns
arrays = [col0, col1]
df.columns = arrays

有输出：

  Set 1             Set 2                     
      0  1  2  3  4     0  1  2  3  4  5  6  7
A     1  1  1  1  1     1  1  1  1  1  1  1  1
B     2  2  2  2  2     2  2  2  2  2  2  2  2
C     2  2  2  2  2     1  1  1  1  1  1  1  1
D     1  1  1  1  1     2  2  2  2  2  2  2  2

但是我想将此矩阵转换为具有以下所需输出的矩阵：

  Set 1             Set 2                     
      0  1  2  3  4     0  1  2  3  4  5  6  7
A     1  2  2  1  1     1  2  2  1  1  2  2  1
B     2  1  1  2  2     2  1  1  2  2  1  1  2
C     2  1  1  2  2     1  2  2  1  1  2  2  1
D     1  2  2  1  1     2  1  1  2  2  1  1  2

说明：我基本上只想在每个multiIndex'Set＃'中保留第一列的值。第一列之后的值应基于第一值中的值交替显示。这些值每两列在1和2之间交替。

有什么用！

Answer 1

这是一种方法。首先，我从您的数据开始（刚刚转置）：

from itertools import cycle
import pandas as pd

# df created as per original post, then transposed (not shown to save space)

print(df.transpose())
  Set 1             Set 2                     
      0  1  2  3  4     0  1  2  3  4  5  6  7
A     1  1  1  1  1     1  1  1  1  1  1  1  1
B     2  2  2  2  2     2  2  2  2  2  2  2  2
C     2  2  2  2  2     1  1  1  1  1  1  1  1
D     1  1  1  1  1     2  2  2  2  2  2  2  2

第二，我创建了一个函数，该函数获取集合中的第一个分数，然后具有交替的对（例如1 2 2 1 1 2 2 1 1 ...）。该函数生成一个熊猫系列。

def func(s):
    assert isinstance(s, pd.Series)
    assert s[0] in {1, 2}
    
    if s[0] == 1:  serves = [2, 2, 1, 1]
    else:          serves = [1, 1, 2, 2]
    
    indicators = [s[0]] + [
        c for _, c in zip(range(1, s.size), cycle(serves))
    ]
    return pd.Series(data=indicators, index=s.index, name=s.name)

第三，我按索引的第一级（Set 1，Set 2，...）对数据帧进行分组，然后将该功能应用于每个组。

grouped = df.groupby( df.index.get_level_values(0) )

for idx in grouped.groups.values():
    df.loc[idx] = df.loc[idx].apply(lambda x: func(x))

print(df.transpose())

  Set 1             Set 2                     
      0  1  2  3  4     0  1  2  3  4  5  6  7
A     1  2  2  1  1     1  2  2  1  1  2  2  1
B     2  1  1  2  2     2  1  1  2  2  1  1  2
C     2  1  1  2  2     1  2  2  1  1  2  2  1
D     1  2  2  1  1     2  1  1  2  2  1  1  2

Answer 2

您可以使用您要修改的列的值是1和2的事实，将第二级列的模数乘以4。然后您可以使用let h; function f() { function g() { console.log(x); } h = function () { x = 27; } let x = 0; g(); // prints 0 x = 1; g(); // prints 1 x = 3; return g; } let x = 4; let g = f(); g(); // prints 3 h(); g(); // prints 27两次替换这些列，并且其中df等于到1或2。

mask

仅当列0中的值与每组列中的其余行相同时，此方法才能按预期工作。否则，您需要为此更改m1和m2

mCol = np.isin(df.columns.get_level_values(1)%4, [1,2])
m1 = df.eq(1)
m2 = df.eq(2)
res_ = (df.mask(mCol&m1, 2)
          .mask(mCol&m2, 1))
print(res_)
  Set 1             Set 2                     
      0  1  2  3  4     0  1  2  3  4  5  6  7
A     1  2  2  1  1     1  2  2  1  1  2  2  1
B     2  1  1  2  2     2  1  1  2  2  1  1  2
C     2  1  1  2  2     1  2  2  1  1  2  2  1
D     1  2  2  1  1     2  1  1  2  2  1  1  2

Answer 3

如果我正确理解了您的问题和数据集，那么您想翻转第二级位于此数学序列公式4 * n + 1和4 * n + 2中从n = 0开始的列的值。因此，只需构建此序列并使用query对其进行切片以翻转值并向后更新就可以了。

构建序列

import numpy as np

n = df.columns.max()[1]
m = (np.arange(n) * 4 + 1)
a = m.tolist() + (m + 1).tolist()

翻转所有列，transpose与query一起使用ilevel_1以选择要更新的指定列。最后，transpose返回并update。

df.update((3 - df).T.query('ilevel_1 in @a').T)
print(df)

或者transpose，query，翻转值，transpose返回和update

df.update((3 - df.T.query('ilevel_1 in @a')).T)  
print(df)

Out[357]:
  Set 1             Set 2
      0  1  2  3  4     0  1  2  3  4  5  6  7
A     1  2  2  1  1     1  2  2  1  1  2  2  1
B     2  1  1  2  2     2  1  1  2  2  1  1  2
C     2  1  1  2  2     1  2  2  1  1  2  2  1
D     1  2  2  1  1     2  1  1  2  2  1  1  2

如果您不想transpose和query，则可以使用IndexSlice，loc

ix = pd.IndexSice
df.update(3 - df.loc[:,ix[:, a]])

Out[399]:
  Set 1             Set 2
      0  1  2  3  4     0  1  2  3  4  5  6  7
A     1  2  2  1  1     1  2  2  1  1  2  2  1
B     2  1  1  2  2     2  1  1  2  2  1  1  2
C     2  1  1  2  2     1  2  2  1  1  2  2  1
D     1  2  2  1  1     2  1  1  2  2  1  1  2

如何用multiIndex和熊猫中的先前值替代值？

3 个答案: