我正在学习python。我想计算值之间的相关性。我的数据是字典。
My_data = {1: [1450.0, -80.0, 840.0, -220.0, 630.0, 780.0, -1140.0], 2: [1450.0, -80.0, 840.0, -220.0, 630.0, 780.0, -1140.0],3:[ 720.0, -230.0, 460.0, 220.0, 710.0, -460.0, 90.0] }
这是我期望的回报。
1 2 3
1 1 0.69 0.77
2 1 0.54
3 1
这是我试过的代码。我得到TypeError:/支持/:' list'不支持的操作数类型而且' long' 我不确定出了什么问题。如果有人解释我并帮助我找到理想的解决方案,我将不胜感激。
my_array=np.array(My_data .values())
Correlation = np.corrcoef(my_array,my_array)
答案 0 :(得分:1)
使用pandas(numpy的包装),您可以按如下方式进行操作:
In [55]: import pandas as pd
In [56]: df = pd.DataFrame.from_dict(My_data, orient='index').T
In [57]: df.corr(method='pearson')
Out[57]:
1 2 3
1 1.000000 1.000000 0.384781
2 1.000000 1.000000 0.121978
3 0.384781 0.121978 1.000000
In [58]: df.corr(method='kendall')
Out[58]:
1 2 3
1 1.000000 1.000000 0.333333
2 1.000000 1.000000 0.240385
3 0.333333 0.240385 1.000000
In [59]: df.corr(method='spearman')
Out[59]:
1 2 3
1 1.000000 1.00000 0.464286
2 1.000000 1.00000 0.327370
3 0.464286 0.32737 1.000000
In [60]:
以下行从词典pandas.DataFrame
My_data
df = pd.DataFrame.from_dict(My_data, orient='index').T
看起来像这样:
In [60]: df
Out[60]:
1 2 3
0 1450.0 1450.0 720.0
1 -80.0 -80.0 -230.0
2 840.0 840.0 460.0
3 -220.0 -220.0 220.0
4 630.0 630.0 710.0
5 780.0 780.0 -460.0
6 -1140.0 -1140.0 90.0
7 NaN 450.0 -640.0
8 NaN 730.0 870.0
9 NaN -810.0 -290.0
10 NaN 390.0 -2180.0
11 NaN -220.0 -790.0
12 NaN -1640.0 65.0
13 NaN -590.0 70.0
14 NaN -145.0 460.0
15 NaN -420.0 NaN
16 NaN 620.0 NaN
17 NaN 450.0 NaN
18 NaN -90.0 NaN
19 NaN 990.0 NaN
20 NaN -705.0 NaN
然后df.corr()
将计算列之间的成对相关性。
首先需要将数据转换为numpy.ndarray,然后就可以像这样计算相关性,
In [91]: np.corrcoef(np.asarray(new_data.values()))
Out[91]:
array([[ 1. , 1. , 0.38478131],
[ 1. , 1. , 0.38478131],
[ 0.38478131, 0.38478131, 1. ]])
In [92]: