numpy:将大概率矩阵有效地凝聚成簇

时间:2018-11-19 02:03:18

标签: python numpy

尊敬的Stackoverflow用户

import numpy as np
import math
sg    = [-0.02, 0.02, 0.00, 0.01, 0.00, 0.00, -0.01, 0.0, 0.01, 0.02]
a     = [14.3, 9.6, 4.8, 11.2, 1.5, 2.8, 15.2, 3.4, 4.5, 0.3]
p_mix = [[ 0.016537663 , 0.018633189 , 0.002189919 , 0.002699641 , 0.018652067 , 0.004814046 , 0.001510289 , 0.017783651 , 0.016141212 , 0.015065131 ],
[ 0.014177837 , 0.018652067 , 0.003586936 , 0.000339815 , 0.011100623 , 0.010175571 , 0.004644138 , 0.008004531 , 0.001755711 , 0.005927884 ],
[ 0.009495941 , 0.016141212 , 0.005097225 , 0.010213328 , 0.015065131 , 0.010383236 , 0.001755711 , 0.009722484 , 0.014555409 , 0.001434774 ],
[ 0.018633189 , 0.009684727 , 0.008646404 , 0.004474231 , 0.01793468 , 0.015839154 , 0.018312252 , 0.00135926 , 0.011761374 , 0.018444402 ],
[ 0.009741363 , 0.008268831 , 0.010307721 , 0.012610912 , 0.004530867 , 0.014442137 , 0.015555975 , 0.01534831 , 0.000226543 , 0.016235605 ],
[ 0.001415896 , 0.00774023 , 0.006399849 , 0.007910138 , 0.005248254 , 0.01534831 , 0.000151029 , 0.01427223 , 0.011383802 , 0.005097225 ],
[ 0.01236549 , 0.015593732 , 0.007683594 , 0.00230319 , 0.011157259 , 0.009363791 , 0.001434774 , 0.011969039 , 0.008721918 , 0.01495186 ],
[ 0.014177837 , 0.010572022 , 0.0103266 , 0.011025109 , 0.002661884 , 0.014857466 , 0.01304512 , 0.018840853 , 0.010515386 , 0.009080612 ],
[ 0.007343779 , 0.018217859 , 0.016084576 , 0.01466868 , 0.001944497 , 0.018161223 , 0.012403247 , 0.018689824 , 0.003964508 , 0.001038324 ],
[ 0.011723617 , 0.009552577 , 0.001944497 , 0.01670757 , 0.007947895 , 0.000717387 , 0.00271852 , 0.00883519 , 0.015933547 , 0.007173872 ]]

u_sg = list(set(sg))                                             #get the set of distinct charges
u_sg = sorted(u_sg)                                              #sort the set of distinct charges in ascending order
n_ch = len(u_sg)                                                 #get the number of distinct charges
n_cl = len(sg)                                                   #get the total number of present cluster
p_simplified = np.zeros((n_ch, n_ch))                            #initialize the matrix of simplified probabilities
a_simplified = list()                                            #initialize the list of simplified areas
for j in range(n_ch):                                            #loop over the charge clusters
    a_r = 0                                                      #initialize the area of the individual charge under consideration
    for i in range(n_cl):                                        #loop over the set of initial clusters
        if math.fabs(sg[i] - u_sg[j]) < 10 ** -10:               #if the cluster charge is equal to the charge under consideration ...
            a_r  += a[i]                                         #add the corresponding area to the area of the charge under consideration
    a_simplified.append(a_r)                                     #once all clusters are treated, append the area to the list of simplified areas
    for jp in range(n_ch):                                       #loop over the charge clusters
        pi_p_iip  = 0.                                           #initialize condensed probability for the pair of charge clusters j/jp under consideration
        ar        = 0.                                           #initialize area to normalize interaction of j' with j
        for i in range(n_cl):                                    #loop over the set of initial clusters
            p_iip = 0.                                           #initialize sum of probabilities of i to interact with ip if ip charge is equal to jp charge
            if math.fabs(sg[i] - u_sg[j]) < 10 ** -10:           #if charge of i-th initial cluster is equal to j-th charge ...
                for ip in range(n_cl):                           #loop over the set of initial clusters
                    if math.fabs(sg[ip] - u_sg[jp]) < 10 ** -10: #if charge of ip-th initial cluster is equal to jp-th charge ...
                        p_iip += p_mix[i][ip]                    #add this probability to the sum of probabilities of i to interact with all jp charged segments
                pi_p_iip += a[i] * p_iip                         #add together all probabilities of of segments with j-th charge to interact with segments charged jp, weighted by their areas
                ar       += a[i]                                 #add together all areas of segments with j-th charge (normalization factor)
        if ar > 10 ** -16:                                       #if probability is not nil ...
            p_simplified[j][jp] = pi_p_iip / ar                  #normalize
        else:                                                    #otherwise
            p_simplified[j][jp] = 0                              #well, the probability is nil

print a_simplified
print p_simplified

这是我用来“压缩”向量和矩阵的算法。向量包含与类型列表(每个都有一个属性列表,此处未显示,但在本算法中无需明确知道它们的特征)相关联的区域(代码中的a)和平方矩阵(代码中的p_mix) )包含每种类型之间相互作用的概率。我想压缩向量和矩阵,以便它们仅反映一个属性(而不是属性列表)的变化。此属性是带电的(包含在向量“ sg”中)。

以上代码的结果为:

[14.3, 15.2, 12.5, 15.7, 9.9]
[[0.01653766 0.00151029 0.04343968 0.01884085 0.03369832]
 [0.01236549 0.00143477 0.04017368 0.01102511 0.03054559]
 [0.00898894 0.00612301 0.04276141 0.02123255 0.01791082]
 [0.01539737 0.01661859 0.0469612  0.01692281 0.02558593]
 [0.01410347 0.00458579 0.03246091 0.00302115 0.02434197]]

由于初始矩阵的大小(缩合之前)可以超过10亿个元素,因此此操作很慢(甚至可能导致内存错误,但这不属于本文的重点)。

在我代码的其他部分,我可以用NumPy数组和命令替换普通的python列表和循环,这使其速度更快。但是这个特定的部分显示出棘手的问题,我将需要一些帮助来弄清楚如何将其转换为NumPy。

1 个答案:

答案 0 :(得分:0)

多亏了max9111评论,我的问题已解决。

我下载了Numba,在代码中导入了Numba,将上面的代码放入函数中,并在函数的def行上方添加了“ @jit(nopython = True)”。如max9111所预测的,这将计算速度提高了2到3个数量级。