Python求解器函数(如excel工作表求解器函数)可基于数据集预测a和b值

时间:2019-07-29 09:53:22

标签: python python-3.x pandas numpy pulp

Excel工作表求解器功能根据以下数据框预测a=-12.7705719809672b=4.65590041575483的值。

Maturity (C-Days) Strength, (Mpa)      x               y   y^2                   (y-x)^2
10.8              23.8             23.8495018161717   -0.6   36                  0.002450429804294
28.2              28.4             28.3164712450952   -1.4  1.96000000000001    0.006977052895941
70.7              32.6             32.5941766134432    2.8   7.84                3.3911830989322E-05
105.0             34.4             34.4398346638965    4.6   21.16                0.001586800447746

通过以下公式计算数据框行值。

  • x[i] = -a + b*ln(M[i])
  • y[i] = S[i] - avg_strength
  • y^2[i] = y[i]^2
  • (y-x)^2[i] = S[i] - x[i]
  • sum(y^2) = 208.0944
  • sum((y-x)^2) = 0.011048194978971

哪里

  1. avg_strength = 23.86#平均力量
  2. i - Row number
  3. S - Strength
  4. M - Maturity

计算R ^ 2

第一次使用ab的值来计算x(y-x)^2

R^2 = 1 - sum((y-x)^2)/sum(y^2) 

其中必须为R^2 >= 0.9,并预测ab的值。

  

我正在寻找与python中的excel求解器函数相同的解决方案   预测a和b值。

我的Python代码:

import pandas as pd
import numpy as np
from pulp import LpVariable, LpMinimize, LpProblem,lpSum,LpStatus,lpSum,LpMaximize,LpInteger
import math

m_s_data = {'maturity':[0.1,10.8,28.2,70.7,105.0],'strength':[0.1,23.8,28.4,32.6,34.4]}
df = pd.DataFrame(m_s_data)
strength_avg = round(df['strength'].mean(),2)
df['y'] = df['strength'] - strength_avg
df['y2'] = df['y']**2
y2_sum = sum([df['y2'][idx] for idx in df.index])

x = LpVariable.dicts("x", df.index,lowBound=-100,upBound=100)
y3 = LpVariable.dicts("y3", df.index,lowBound=-100,upBound=100)
mod = LpProblem("calAB", LpMinimize)
a=1
b=1
for idx in df.index:
    x_row_data = -a + b * np.log(df['maturity'][idx])
    mod += x[idx] == x_row_data
    strength = df['strength'][idx]
    mod += y3[idx] == math.pow(strength,2) + x_row_data * x_row_data -2* strength * x_row_data

#R^2 must be greater than or equal to 0.9
mod += 1- (lpSum(y3[idx] for idx in df.index)/y2_sum) >= 0.9
mod.solve()
print(df)

# Each of the variables is printed with it's resolved optimum value
for idx in df.index:
    print(y3[idx].name, "=", y3[idx].value)

输入数据框:

enter image description here

输出数据框除外: enter image description here

1 个答案:

答案 0 :(得分:1)

您可以使用最小二乘法简单地使用任何种类的线性求解器。在这里,我使用np.linalg.lstsq()

import numpy as np

#we have a system of linear equation: Ax = b, according to your equation :-x[0] + x[1]*ln(M) = b

M = np.log([10.80000,28.20000,70.70000,105.00000])
A = np.vstack((-np.ones(M.size),M))
b = np.array([23.84950,28.31647,32.59418,34.43983])
x = np.linalg.lstsq(A.transpose(),b)[0]

结果:

x = array([-12.77023019,   4.65571618])