答案 0 :(得分:2)
最简单的方法是使用scipy.stats(参见here)
import numpy as np
from scipy.stats.stats import pearsonr
x = np.random.random(20)
y = np.random.random(20)
print(pearsonr(x, y))
这将为您提供两个值,即相关性和p值。
你可以这样自己实现:
x = np.random.random(20)
y = np.random.random(20)
x_bar = np.mean(x)
y_bar = np.mean(y)
top = np.sum((x - x_bar) * (y - y_bar))
bot = np.sqrt(np.sum(np.power(x - x_bar, 2)) * np.sum(np.power(y - y_bar, 2)))
print(top/bot)
两者都给出相同的结果,祝你好运!
答案 1 :(得分:1)
使用for
循环的直接实现将是:
import math
def correlation(x, y):
x_bar = sum(x) / len(x)
y_bar = sum(y) / len(y)
var_x = sum((x_i - x_bar)**2 for x_i in x)
var_y = sum((y_i - y_bar)**2 for y_i in y)
assert len(x) == len(y)
numerator = sum((x_i - x_bar) * (y_i - y_bar) for x_i, y_i in zip(x, y))
denominator = math.sqrt(var_x * var_y)
return numerator / denominator
if __name__ == "__main__":
x = [...]
y = [...]
print(correlation(x, y))
在进行大量数值计算时,通常使用numpy
模块,其中此函数为already defined:
import numpy as np
if __name__ == "__main__":
x = np.array([...])
y = np.array([...])
print(np.corrcoef(x, y)[0, 1])