Question

修正：

如果我有一个包含5列Col1和Col2和Col3＆Col4和Col5的pandas DataFrame，我需要获得最大的Pearson相关性（Col2，Col3）＆（Col2，Col4）＆（Col2，Col5）之间的系数，通过考虑{{ 1}}

通过下一个公式获得的Col1的修改值：

Col2

其中df['Col1']=np.power((df['Col1']),B) df['Col2']=df['Col2']*df['Col1']是变化的变量（单个值），以获取（B，Col2的新值）与{{1}的新值之间的最大皮尔逊相关系数。 }，Col3）和（Col2，Col4的新值）。

更新：

上面提到的上表包含5列，（Col2，Col5）和（Col2，Col3）和（{{ 1}}，Col2）如下表所示。

我需要根据上述两个方程式更改Col4的值，其中变化的值为Col2。

所以问题是如何获得最佳的Col5值，使新的相关系数大于或等于其对应值（旧值）？

更新2：

Col1，Col2，Col3，Col4，Col5

2,0.051361397,2618,1453,1099

4,0.053507779,306,153,150

2,0.041236151,39,54,34

6,0.094526419,2755,2209,1947

4,0.079773397,2313,1261,1022

4,0.083891415,3528,2502,2029

6,0.090737243,3594,2781,2508

2,0.069552772,370,234,246

2,0.052401789,690,402,280

2,0.039930675,1218,846,631

4,0.065952096,1706,523,453

2,0.053064126,314,197,123

6,0.076847486,4019,1675,1452

2,0.044881545,604,402,356

2,0.073102611,2214,1263,1050

0,0.046998526,938,648,572

Answer 1

不是非常优雅，但是可以工作；随时使它更通用：

import pandas as pd
from scipy.optimize import minimize


def minimize_me(b, df):

    # we want to maximize, so we have to multiply by -1
    return -1 * df['Col3'].corr(df['Col2'] * df['Col1'] ** b )

# read your dataframe from somehwere, e.g. csv
df = pd.read_clipboard(sep=',')

# B is greater than 0 for now
bnds = [(0, None)]

res = minimize(minimize_me, (1), args=(df,), bounds=bnds)

if res.success:
    # that's the optimal B
    print(res.x[0])

    # that's the highest correlation you can get
    print(-1 * res.fun)
else:
    print("Sorry, the optimization was not successful. Try with another initial"
          " guess or optimization method")

这将打印：

0.9020784246026575 # your B
0.7614993786787415 # highest correlation for corr(col2, col3)

我现在从clipboard中读取内容，将其替换为您的.csv文件。然后，您还应该避免对列进行硬编码。上面的代码仅用于演示目的，因此您可以了解如何自行设置优化问题。

如果您对总和感兴趣，则可以使用（未修改的其余代码）：

def minimize_me(b, df):

    col_mod = df['Col2'] * df['Col1'] ** b

    # we want to maximize, so we have to multiply by -1
    return -1 * (df['Col3'].corr(col_mod) +
                 df['Col4'].corr(col_mod) +
                 df['Col5'].corr(col_mod))

这将打印：

1.0452394748131613
2.3428368479642137

优化变化的变量以获得多列的最大皮尔逊相关系数

1 个答案: