Question

假设以下DataFrame：

我想在所有行到所有其他行之间进行计算。
例如，如果计算为lambda r1, r2: abs(r1-r2)，则输出为（按某种顺序）

id       col_name
1        10
2        200
3        3000
4        190
5        2990
6        2800

问题：

如何仅获得上述输出？
如何以最“像熊猫一样”的方式将结果与创作者联系起来？

我想将所有内容尽可能地保留在单个表中，以仍然支持合理查找的方式。

我的数据量并不大，而且永远也不会。

EDIT1：

回答我的问题2的一种方法是

id       col_name    origin1    origin2
1        10          1          2
2        200         1          3
3        3000        1          4
4        190         2          3
5        2990        2          4
6        2800        3          4

我想知道这是否是标准的，是否有内置的方法，或者是否还有其他/更好的方法

Answer 1

使用广播减法，然后np.tril_indices提取下对角线（正值）。

# <= 0.23 
# u = df['A'].values
# 0.24+
u = df['A'].to_numpy()  
u2 = (u[:,None] - u)   

pd.Series(u2[np.tril_indices_from(u2, k=-1)])

0      10
1     200
2     190
3    3000
4    2990
5    2800
dtype: int64

或者，使用subtract.outer避免事先转换为数组。

u2 = np.subtract.outer(*[df.A]*2)
pd.Series(u2[np.tril_indices_from(u2, k=-1)])

如果还需要索引，请使用

idx = np.tril_indices_from(u2, k=-1)
pd.DataFrame({
    'val':u2[np.tril_indices_from(u2, k=-1)], 
    'row': idx[0], 
    'col': idx[1]
})

    val  row  col
0    10    1    0
1   200    2    0
2   190    2    1
3  3000    3    0
4  2990    3    1
5  2800    3    2

Answer 2

IIUC itertools

import itertools

s=list(itertools.combinations(df.index, 2)) 
pd.Series([df.A.loc[x[1]]-df.A.loc[x[0]] for x in s ])
Out[495]: 
0      10
1     200
2    3000
3     190
4    2990
5    2800
dtype: int64

更新

s=list(itertools.combinations(df.index, 2)) 

pd.DataFrame([x+(df.A.loc[x[1]]-df.A.loc[x[0]],) for x in s ])
Out[518]: 
   0  1     2
0  0  1    10
1  0  2   200
2  0  3  3000
3  1  2   190
4  1  3  2990
5  2  3  2800

对列中的所有行对执行操作

2 个答案: