Question

我在读取类似的.csv文件后有一个pandas数据框：

import itertools as it
import pandas as pd
import numpy as np
import scipy as sp

x = np.random.randn(5)
y = np.sin(x)
z = np.sin(x)+1
df = pd.DataFrame({'x':x, 'y':y, 'z':z})

df = 
          x         y         z
0  0.233070  0.230965  1.230965
1 -1.956269 -0.926621  0.073379
2 -0.015575 -0.015575  0.984425
3 -0.106887 -0.106684  0.893316
4 -0.510168 -0.488324  0.511676

我想使用itertools.combinations和scipy.spatial.distance.euclidean计算成对的欧氏距离，并通过扩展df或作为新的数据帧来存储这些值。例如，扩展df将类似于此（x.xxxxxxx当然是需要计算的值）：

df = 
          x         y         z        x-y        x-z        x-z 
0  0.233070  0.230965  1.230965   x.xxxxxx   x.xxxxxx   x.xxxxxx
1 -1.956269 -0.926621  0.073379   x.xxxxxx   x.xxxxxx   x.xxxxxx
2 -0.015575 -0.015575  0.984425   x.xxxxxx   x.xxxxxx   x.xxxxxx
3 -0.106887 -0.106684  0.893316   x.xxxxxx   x.xxxxxx   x.xxxxxx
4 -0.510168 -0.488324  0.511676   x.xxxxxx   x.xxxxxx   x.xxxxxx

我正在使用的实际数据集很大，所以我想找到一种有效的pythonic方法来处理这个问题。我只需要唯一的成对比较，因此我想避免itertools.combinations包含的n路比较（即，这里将是x-y-z），以及避免重复（例如，y-x，z-x，z-y）。希望这很清楚，谢谢你的帮助。

使用itertools.combinations计算pandas DataFrame中的成对欧几里德距离值

0 个答案: