我正在尝试为我的数据帧中的所有行(约100K行)求解两个等式的交集:y=Rx^1.75
和y=ax^2+bx+c
。 R,a,b,c
的每个值对于每一行都是不同的。我可以通过遍历数据帧并为每一行调用fsolve()
来逐个解决它们(如下所示),但我想知道是否有更好的方法来执行此操作。
我的问题是:是否可以将其转换为数组计算,即一次解决所有行?关于如何更快地完成计算的任何想法都会非常有用。
以下是具有系数
的示例数据帧 R a b c
0 0.5 -0.01 -0.50 32.42
1 0.6 0.00 0.07 14.12
2 0.7 -0.01 -0.50 32.42
这是我用来测试方法的工作示例代码:
import numpy as np
import pandas as pd
from scipy.optimize import *
# The fSolve function
def myFunction(zGuess,*Params):
# Get the coefficients
R,a,b,c = Params
# Get the initial guess
x,y = zGuess
F = np.empty((2))
F[0] = R*x**1.75-y
F[1] = a*x**2+b*x+c-y
return F
# Example Dataframe that is 10K rows of different coefficients
df = pd.DataFrame({"R":[0.500, 0.600,0.700],
"a":[-0.01, 0.000,-0.01],
"b":[-0.50, 0.070,-0.50],
"c":[32.42, 14.12,32.42]})
# Initial guess
zGuess = np.array([50,50])
# Make a place to store the answers
df["x"] = None
df["y"] = None
# Loop through the rows?
for index, coeffs in df.iterrows():
# Get the coefficients
Params = (coeffs["R"],coeffs["a"],coeffs["b"],coeffs["c"])
# fSolve
z = fsolve(myFunction,zGuess,args=Params)
# Set the answers
df.loc[index,"x"] = z[0]
df.loc[index,"y"] = z[1]
print df
============================================
我在下面得到两个答案,两个答案都给出了数学上正确的答案。所以在这一点上,所有人的计算都更快!测试数据帧将为3K行。
回答#1(牛顿法)
# Solution 1
import numpy as np
import pandas as pd
Count = 1000
df = pd.DataFrame({"R":[0.500, 0.600,0.700]*Count,
"a":[-0.01, 0.000,-0.01]*Count,
"b":[-0.50, 0.070,-0.50]*Count,
"c":[32.42, 14.12,32.42]*Count})
from datetime import datetime
t_start = datetime.now()
#---------------------------------
InitialGuess = 50.0
Iterations = 20
x = np.full(df["a"].shape, InitialGuess)
for i in range(Iterations):
x = x - (-df["R"]*x**1.75 + df["a"]*x**2 + df["b"]*x + df["c"])/(-1.75*df["R"]*x**0.75 + 2*df["a"]*x + df["b"])
df["x"] = x
df["y"] = df["R"]*x**1.75
df["x Error"] = df["a"]*x**2 + df["b"]*x + df["c"] - df["R"]*x**1.75
#---------------------------------
t_end = datetime.now()
print ('\n\n\nTime spent running this was:')
print(t_end - t_start)
print df
花的时间是:
Time spent running this was:
0:00:00.015000
回答#2(fSolve)
# Solution 2
import numpy as np
import pandas as pd
from scipy.optimize import *
Count = 1000
df = pd.DataFrame({"R":[0.500, 0.600,0.700]*Count,
"a":[-0.01, 0.000,-0.01]*Count,
"b":[-0.50, 0.070,-0.50]*Count,
"c":[32.42, 14.12,32.42]*Count})
from datetime import datetime
t_start = datetime.now()
#---------------------------------
coefs = df.values[:, 0:4]
def mfun(x, *args):
args = np.array(args[0], dtype=np.float64)
return args[:,1] * x**2 + args[:,2] * x + args[:,3] - args[:,0] * x**1.75
nrows = coefs.shape[0]
df["x"] = fsolve(mfun, np.ones(nrows) * 50, args=coefs)
df["y"] = coefs[:, 0] * df["x"]**1.75
#---------------------------------
t_end = datetime.now()
print ('\n\n\nTime spent running this was:')
print(t_end - t_start)
print df
花的时间是:
Time spent running this was:
0:00:35.786000
对于这种特殊情况,牛顿方法要快得多(我可以在0:00:01.139000
中运行300K行!)。谢谢你们两个!
答案 0 :(得分:0)
也许你可以使用牛顿的方法:
import numpy as np
data = np.array(
[[0.5, -0.01, -0.50, 32.42],
[0.6, 0.00, 0.07, 14.12],
[0.7, -0.01, -0.50, 32.42]])
R, a, b, c = data.T
x = np.full(a.shape, 10.0)
m = 1.0
for i in range(20):
x = x - m * (-R*x**1.75 + a*x**2 + b*x + c)/(-1.75*R*x**0.75 + 2*a*x + b)
print(a*x**2 + b*x + c - R * x**1.75)
输出:
[ 0.00000000e+00 1.77635684e-15 3.55271368e-15]
注意选择迭代计数和x的初始值。
答案 1 :(得分:0)
你可以摆脱一个变量,然后使用Numpy的阵列广播:
# Your `df`:
#R a b c x y
#0 0.5 -0.01 -0.50 32.42 9.69483 26.6327
#1 0.6 0.00 0.07 14.12 6.18463 14.5529
#2 0.7 -0.01 -0.50 32.42 8.17467 27.6644
# Solved in one go
coefs = df.values[:, 0:4]
def mfun(x, *args):
args = np.array(args[0], dtype=np.float64)
return args[:,1] * x**2 + args[:,2] * x + args[:,3] - args[:,0] * x**1.75
nrows = coefs.shape[0]
x = fsolve(mfun, np.ones(nrows) * 50, args=coefs)
y = coefs[:, 0] * x**1.75
x, y
#(array([ 9.69482605, 6.18462999, 8.17467496]),
#array([26.632690454652423, 14.552924099681404, 27.66440941242009], dtype=object))