运行功能直到皮尔逊相关函数> 0.99

时间:2017-12-06 11:37:18

标签: python pandas numpy pearson-correlation

我有以下代码:

当前代码:

import math
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import linregress

c1_high = 98
c1_low = 75
c2_high = 15
c2_low = 6
c3_high = 8
c3_low = 2

def mix_gen(number):
    flag = 0
    container = []
    y_array = [1,2,3,4,5,6,7,8,9,10,11]

    while flag < number:
        c1 = np.random.uniform(c1_low, c1_high)
        c2 = np.random.uniform(c2_low, c2_high)
        c3 = np.random.uniform(c3_low, c3_high)
        tot = c1+c2+c3

        if 99.99 <= tot <= 100.01:
            flag += 1
            container.append([c1,c2,c3])
    return container

def average(x):
    assert len(x) > 0
    return float(sum(x)) / len(x)

def pearson_def(x, y):
    assert len(x) == len(y)
    n = len(x)
    assert n > 0
    avg_x = average(x)
    avg_y = average(y)
    diffprod = 0
    xdiff2 = 0
    ydiff2 = 0
    for idx in range(n):
        xdiff = x[idx] - avg_x
        ydiff = y[idx] - avg_y
        diffprod += xdiff * ydiff
        xdiff2 += xdiff * xdiff
        ydiff2 += ydiff * ydiff

    return diffprod / math.sqrt(xdiff2 * ydiff2)

def corr_check():
    while True:
        mixes = mix_gen(5)
        mixes_C1 =[item[0] for item in mixes]
        mixes_C2 =[item[1] for item in mixes]
        mixes_C3 =[item[2] for item in mixes]
        mylen = [1,2,3,4,5]
        c1_r = pearson_def(mixes_C1, mylen)
        c2_r = pearson_def(mixes_C2, mylen)
        c3_r = pearson_def(mixes_C3, mylen)

        if c1_r >0.99 and c2_r >0.99 and c3_r>0.99:
            print(mixes)
            print (c1_r)
        else:
            continue

corr = corr_check()
print(corr)

这段代码为我提供了有效的(当转换为数据帧时)以下输出:

    C1    C2    C3    sum   range 
1   70    20    10    100     ^
2   ..                        |  
3   ..                        |
4   ..                        | 
5   ..                        |
6   ..                        |
7   ..                        |
8   ..                        |
9   ..                        |
10  ..                        |
11  90                        _

我要求每行的总和等于100,并且每列具有r ^ 2值(Pearson Corr。)为&gt; 0.99。

然而,所需的复杂性和迭代次数使问题几乎无法解决。是否有更好的方法来实现这一目标,而不是试图依赖所有三个组件C1,C2和C3的初始随机数生成?

0 个答案:

没有答案