Question

我有以下熊猫数据框：

    Cmpd1   Cmpd2   Cmpd3   Cmpd4   Cmpd5   Cmpd6
Cmpd1   1                   
Cmpd2   0.4   1             
Cmpd3   0.6   0.3   1           
Cmpd4   0.46  0.69  0.32    1       
Cmpd5   0.57  0.44  0.41    0.51    1   
Cmpd6   0.41  0.79  0.33    0.56    0.43    1

无论索引是否重复，我都希望根据索引从高到低对其进行排序，我会说这样：

最大值对应于Cmpd6 = 0.79，然后是Cmpd4 = 0.69 ...在某个时候Cmpd6 = 0.56，我想留下这样的列表：

Cmpd6 = 0.79
Cmpd4 = 0.69
Cmpdx = x
Cmpd6 = 0.56

对于数组的每个值，无论索引被重复多少次，我都尝试使用.sort_index (axis = 1)，但它不会产生任何结果，我也尝试了.ravel ()，但确实如此不给我指数。我该如何解决？

谢谢！

我的解决方案：

df = df.where(np.tril(np.ones(df.shape), -1).astype(np.bool))
df = df.stack().reset_index().drop("level_1", axis=1).sort_values(by=0, ascending=False)

Answer 1

我假设源 df 中的每个空元素实际上都包含一个空字符串（不是 NaN ，因为它们将被打印为 NaN ）。

我还注意到您只希望保留值 <1 。

要获得结果，请运行：

s = df.stack()
s[s.apply(lambda x: type(x) is not str and x < 1)]\
    .reset_index(level=1, drop=True).sort_values(ascending=False)\
    .astype(float)

对于您的数据，结果为：

Cmpd6    0.79
Cmpd4    0.69
Cmpd3    0.60
Cmpd5    0.57
Cmpd6    0.56
Cmpd5    0.51
Cmpd4    0.46
Cmpd5    0.44
Cmpd6    0.43
Cmpd6    0.41
Cmpd5    0.41
Cmpd2    0.40
Cmpd6    0.33
Cmpd4    0.32
Cmpd3    0.30
dtype: float64

其他可能的解决方案：

s = df.replace(r'^\s*$', np.nan, regex=True).stack()\
    .reset_index(level=1, drop=True).sort_values(ascending=False)
s[s < 1]

按索引对熊猫数据框进行排序

1 个答案: