获取符合某些条件的Pandas DataFrame的列和行索引对

时间:2014-03-05 06:20:27

标签: python pandas dataframe

假设我有一个像下面这样的Panda DataFrame。这些值基于距离矩阵。

A = pd.DataFrame([(1.0,0.8,0.6708203932499369,0.6761234037828132,0.7302967433402214),
                  (0.8,1.0,0.6708203932499369,0.8451542547285166,0.9128709291752769),
        (0.6708203932499369,0.6708203932499369,1.0,0.5669467095138409,0.6123724356957946),
        (0.6761234037828132,0.8451542547285166,0.5669467095138409,1.0,0.9258200997725514),
        (0.7302967433402214,0.9128709291752769,0.6123724356957946,0.9258200997725514,1.0)
                  ])

输出:

Out[65]: 
          0         1         2         3         4
0  1.000000  0.800000  0.670820  0.676123  0.730297
1  0.800000  1.000000  0.670820  0.845154  0.912871
2  0.670820  0.670820  1.000000  0.566947  0.612372
3  0.676123  0.845154  0.566947  1.000000  0.925820
4  0.730297  0.912871  0.612372  0.925820  1.000000

我只想要上三角。

c2 = A.copy()
c2.values[np.tril_indices_from(c2)] = np.nan

输出:

Out[67]: 

        0    1        2         3         4
    0 NaN  0.8  0.67082  0.676123  0.730297
    1 NaN  NaN  0.67082  0.845154  0.912871
    2 NaN  NaN      NaN  0.566947  0.612372
    3 NaN  NaN      NaN       NaN  0.925820
    4 NaN  NaN      NaN       NaN       NaN

现在我想根据一些标准获得列和行索引对。 例如:获取值大于0.8的列和行索引。为此,输出应为[1,3],[1,4],[3,4]。对此有何帮助?

1 个答案:

答案 0 :(得分:3)

你可以使用numpy的argwhere

In [11]: np.argwhere(c2 > 0.8)
Out[11]: 
array([[1, 3],
       [1, 4],
       [3, 4]])

要获取索引/列(而不是它们的整数位置),可以使用列表推导:

[(c2.index[i], c2.columns[j]) for i, j in np.argwhere(c2 > 0.8)]