我有一个从该文本文件生成的相关列表:
(前两个值表示哪个点之间是相关性)
2 1 -0.798399811877855E-01
3 1 0.357718108972297E+00
3 2 -0.406142457763738E+00
4 1 0.288467030571132E+00
4 2 -0.129115034405361E+00
4 3 0.156739504479856E+00
5 1 -0.756332254716083E-01
5 2 0.479036971438800E+00
5 3 -0.377545460300584E+00
5 4 -0.265467953118191E+00
6 1 0.909003414436468E-01
6 2 -0.363568902645620E+00
6 3 0.482042347959232E+00
6 4 0.292931692897587E+00
6 5 -0.739868576924150E+00
我已经有另一个列表,其中包含与所有点相关的标准偏差。如何将这两者合并为numpy / scipy以创建协方差矩阵?
它需要是一种非常有效的方法,因为有300个点,所以~50 000个相关性。
答案 0 :(得分:2)
假设此表名为df
,第一列标记为A
,第二列为B
且相关值标记为Correlation
:
df2 = df.pivot(index='A', columns='B', values='Correlation')
>>> df2
B 1 2 3 4 5
A
2 -0.0798 NaN NaN NaN NaN
3 0.3580 -0.406 NaN NaN NaN
4 0.2880 -0.129 0.157 NaN NaN
5 -0.0756 0.479 -0.378 -0.265 NaN
6 0.0909 -0.364 0.482 0.293 -0.74
将其转换为对称的方形矩阵,对角线为:
# Get a unique list of all items in rows and columns.
items = list(df2)
items.extend(list(df2.index))
items = list(set(items))
# Create square symmetric correlation matrix
corr = df2.values.tolist()
corr.insert(0, [np.nan] * len(corr))
corr = pd.DataFrame(corr)
corr[len(corr) - 1] = [np.nan] * len(corr)
for i in range(len(corr)):
corr.iat[i, i] = 1. # Set diagonal to 1.00
corr.iloc[i, i:] = corr.iloc[i:, i].values # Flip matrix.
# Rename rows and columns.
corr.index = items
corr.columns = items
>>> corr
1 2 3 4 5 6
1 1.0000 -0.0798 0.358 0.288 -0.0756 0.0909
2 -0.0798 1.0000 -0.406 -0.129 0.4790 -0.3640
3 0.3580 -0.4060 1.000 0.157 -0.3780 0.4820
4 0.2880 -0.1290 0.157 1.000 -0.2650 0.2930
5 -0.0756 0.4790 -0.378 -0.265 1.0000 -0.7400
6 0.0909 -0.3640 0.482 0.293 -0.7400 1.0000
如果std dev数据尚未以矩阵形式存在,请执行相同的步骤。
假设此矩阵名为df_std
,那么您可以按如下方式获得协方差矩阵:
df_cov = corr.multiply(df_std.multiply(df_std.T.values))
答案 1 :(得分:0)
一种方法 -
import numpy as np
# Input list: AList
# Convert input list to a numpy array
A = np.asarray(AList)
# Get the first two columns that are coordinates/points
A01 = A[:,0:2].astype(int)
# Determine size of square output array
N = A01.max()
# Initialize output array & insert values from third column
out = np.zeros((N,N))
out[A01[:,0]-1,A01[:,1]-1] = A[:,2]
# Upper triangular mask
triu_mask = np.triu(np.ones(out.shape,'bool'))
# Fill in the upper triangular region with the
# symmetrical elements from lower triangular region
out[triu_mask] = out.T[triu_mask]
# Fill diagonal with ones
np.fill_diagonal(out,1)
示例运行 -
In [157]: AList # Input list
Out[157]:
[[2, 1, -0.0798399811877855],
[3, 1, 0.357718108972297],
[3, 2, -0.406142457763738],
[4, 1, 0.288467030571132],
[4, 2, -0.129115034405361],
[4, 3, 0.156739504479856],
[5, 1, -0.0756332254716083],
[5, 2, 0.4790369714388],
[5, 3, -0.377545460300584],
[5, 4, -0.265467953118191],
[6, 1, 0.0909003414436468],
[6, 2, -0.36356890264562],
[6, 3, 0.482042347959232],
[6, 4, 0.292931692897587],
[6, 5, -0.73986857692415]]
In [158]: print(out) # Print of output numpy array
[[ 1. -0.07983998 0.35771811 0.28846703 -0.07563323 0.09090034]
[-0.07983998 1. -0.40614246 -0.12911503 0.47903697 -0.3635689 ]
[ 0.35771811 -0.40614246 1. 0.1567395 -0.37754546 0.48204235]
[ 0.28846703 -0.12911503 0.1567395 1. -0.26546795 0.29293169]
[-0.07563323 0.47903697 -0.37754546 -0.26546795 1. -0.73986858]
[ 0.09090034 -0.3635689 0.48204235 0.29293169 -0.73986858 1. ]]
答案 2 :(得分:0)
我假设您正在列出correlation coefficients而不是cross covariances(与@Divakar和@Alexander不同)。因此,协方差矩阵中的条目为c[i,j] = rr[i,j]*sqrt(c[i,i]*c[j,j])
,其中rr[i,j]
为相关系数。显然,c[i,i]
是第i个方差,rr[i,i]==1
。
以下示例显示如何根据列表差异和相关列表构建协方差矩阵:
import numpy as np
from itertools import product
n0 = 5 # dimension of random vector
print("Generating test-matrix CC0 ...")
# Generate a valid test covariance matrix (positive semi-definite)
sq_CC0 = np.random.randn(n0, n0)*10
CC0 = np.dot(sq_CC0, sq_CC0.T)
# extract lists:
lst_var = [(i, CC0[i, i]) for i in range(n0)] # list vor variances
lst_rr = [(i, j, CC0[i,j]/np.sqrt(CC0[i, i]*CC0[j,j])) # list of correlations
for i, j in product(range(n0), range(n0)) if i < j]
print(" Variances:")
for i, val in lst_var:
print(" ", i, val)
print(" Correlations:")
for i, j, val in lst_rr:
print(" ", i, j, val)
print("Building matrix CC1 ...")
n1 = len(lst_var) # dimension
# Exploit CC[i, j] = rr[i, j]* sqrt(CC[i, i]*CC[j, j]):
aa_var = np.array(lst_var) # convert to array to do index magic
ii = np.array(aa_var[:, 0], dtype=int) # indexes must be ints
vv = np.zeros(n1)
vv[ii] = np.sqrt(aa_var[ :, 1])
CC1a = np.outer(vv, vv) # Its entries are sqrt(CC[i, i]*CC[j, j])
aa_rr = np.array(lst_rr)
CC1b = np.zeros((n1, n1))
ii, jj = (np.array(aa_rr[ :, k], dtype=int) for k in [0, 1]) # indexes must be ints
CC1b[ii, jj] = aa_rr[:, 2] # build matrix with correlations
CC1 = CC1a * (CC1b + CC1b.T + np.eye(n1)) # build covariance matrix
print(" CC0 == CC1 is:", np.allclose(CC0, CC1))
请注意,不包括文本的解析,索引从0开始,并且没有给出双重关联(rr[i,j]==rr[j,i]
)。