Question

对于数据框如下：

df = pd.DataFrame([(1, 2, 3, 4, 0, 1, 2, 3, 4),
              (1, 2, 0, 1, 2, 3, 4, 5, 6),
              (1, 2, 3, 0, 1, 2, 3, 0, 1)],
            columns=['P'+str(i+1) for i in range(9)],
            index = ['row1', 'row2', 'row3'])

结果df：

        P1  P2  P3  P4  P5  P6  P7  P8  P9
row1    1   2   3   4   0   1   2   3   4
row2    1   2   0   1   2   3   4   5   6
row3    1   2   3   0   1   2   3   0   1

我想查找一行是否有多次出现的最大值，例如

 df.max(axis=1)
 >>> row1    4
     row2    6
 >>> row3    3

row1和row3重复最大值。

理想情况下，解决方案是矢量化的，因为我有40,000行和50列。

Answer 1

使用eq进行comapring并按sum计算while read -A fields，按boolean indexing计算最后的过滤器索引：

True

<强>详细：

a = df.eq(df.max(axis=1),axis=0).sum(axis=1)
print (a)
row1    2
row2    1
row3    2
dtype: int64

b = a.index[a > 1]
print (b)
Index(['row1', 'row3'], dtype='object')

Numpy 替代方案：

print (df.eq(df.max(axis=1),axis=0))
         P1     P2     P3     P4     P5     P6     P7     P8     P9
row1  False  False  False   True  False  False  False  False   True
row2  False  False  False  False  False  False  False  False   True
row3  False  False   True  False  False  False   True  False  False

查找特定（最大）值是否连续重复

1 个答案: