我的长期目标是创建一个模块,该模块针对特定数据集,将分段回归拟合到任意数量的断点,以及标准多项式和线性曲线拟合,然后评估哪个拟合最适合数据(可能使用AIC或BIC)。
我有一个函数,该函数使用微分进化,在假设1个断点的情况下对x和y数据集使用分段回归:
def segReg_one(xData,yData):
def func(xVals,model_break,slopeA,slopeB,offsetA,offsetB): #Initialization of the piecewise function
returnArray=[]
for x in xVals:
if x > model_break:
returnArray.append(slopeA * x + offsetA)
else:
returnArray.append(slopeB * x + offsetB)
return returnArray
def sumSquaredError(parametersTuple): #Definition of an error function to minimize
modely=func(xData,*parametersTuple)
warnings.filterwarnings("ignore") # Ignore warnings by genetic algorithm
return np.sum((yData-modely)**2.0)
def generate_genetic_Parameters():
initial_parameters=[]
x_max=np.max(xData)
x_min=np.min(xData)
y_max=np.max(yData)
y_min=np.min(yData)
slope=10*(y_max-y_min)/(x_max-x_min)
initial_parameters.append([x_max,x_min]) #Bounds for model break point
initial_parameters.append([-slope,slope]) #Bounds for slopeA
initial_parameters.append([-slope,slope]) #Bounds for slopeB
initial_parameters.append([y_max,y_min]) #Bounds for offset A
initial_parameters.append([y_max,y_min]) #Bounds for offset B
result=differential_evolution(sumSquaredError,initial_parameters,seed=3)
return result.x
geneticParameters = generate_genetic_Parameters() #Generates genetic parameters
fittedParameters, pcov= curve_fit(func, xData, yData, geneticParameters) #Fits the data
print('Parameters:', fittedParameters)
print('Model break at: ', fittedParameters[0])
print('Slope of line where x < model break: ', fittedParameters[1])
print('Slope of line where x > model break: ', fittedParameters[2])
print('Offset of line where x < model break: ', fittedParameters[3])
print('Offset of line where x > model break: ', fittedParameters[4])
model=func(xData,*fittedParameters)
absError = model - yData
SE = np.square(absError)
MSE = np.mean(SE)
RMSE = np.sqrt(MSE)
Rsquared = 1.0 - (np.var(absError) / np.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
axes.plot(xData, yData, 'D')
xModel = np.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all')
graphWidth = 800
graphHeight = 600
return ModelAndScatterPlot(800,600)
哪个运行正常。但是,我尝试扩展模型以允许超过1个断点:
def segReg_two(xData,yData):
def func(xData,break1,break2,slope1,slope_mid,slope2,offset1,offset_mid,offset2):
returnArray=[]
for x in xData:
if x < break1:
returnArray.append(slope1 * x + offset1)
if (x < break2 and x > break1):
returnArray.append(slope_mid * x + offset_mid)
else:
returnArray.append(slope2 * x + offset2)
def sumSquaredError(parametersTuple): #Definition of an error function to minimize
modely=func(xData,*parametersTuple)
warnings.filterwarnings("ignore") # Ignore warnings by genetic algorithm
return np.sum((yData-modely)**2.0)
def generate_genetic_Parameters():
initial_parameters=[]
x_max=np.max(xData)
x_min=np.min(xData)
y_max=np.max(yData)
y_min=np.min(yData)
slope=10*(y_max-y_min)/(x_max-x_min)
initial_parameters.append([x_max,x_min]) #Bounds for model break point
initial_parameters.append([x_max,x_min])
initial_parameters.append([-slope,slope])
initial_parameters.append([-slope,slope])
initial_parameters.append([-slope,slope])
initial_parameters.append([y_max,y_min])
initial_parameters.append([y_max,y_min])
initial_parameters.append([y_max,y_min])
result=differential_evolution(sumSquaredError,initial_parameters,seed=3)
return result.x
geneticParameters = generate_genetic_Parameters() #Generates genetic parameters
fittedParameters, pcov= curve_fit(func, xData, yData, geneticParameters) #Fits the data
print('Parameters:', fittedParameters)
print('Model break at: ', fittedParameters[0])
print('Slope of line where x < model break: ', fittedParameters[1])
print('Slope of line where x > model break: ', fittedParameters[2])
print('Offset of line where x < model break: ', fittedParameters[3])
print('Offset of line where x > model break: ', fittedParameters[4])
model=func(xData,*fittedParameters)
absError = model - yData
SE = np.square(absError)
MSE = np.mean(SE)
RMSE = np.sqrt(MSE)
Rsquared = 1.0 - (np.var(absError) / np.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
axes.plot(xData, yData, 'D')
xModel = np.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all')
graphWidth = 800
graphHeight = 600
return ModelAndScatterPlot(800,600)
当我运行segReg_two(x,y)
并停在differential_evolution
位时,此代码会出现问题:
TypeError:--'float'和'NoneType'的不受支持的操作数类型 在处理上述异常期间,发生了另一个异常:
RuntimeError:类似于地图的可调用对象必须采用f(func,iterable)的形式,并返回与'iterable'相同长度的数字序列。
我在segReg_one
上没有遇到这个问题,所以我不明白为什么在这里发生它。我假设(并且我可能对此假设不正确),参数iterable
必须具有与我的错误函数兼容的尺寸。但是,除了我正在找到断点,斜率和偏移量以在给定边界的情况下将断点最小化的事实之外,我不确定这两个参数是如何精确关联的。
此外,我的进攻计划似乎是漫长而残酷的。有解决这个问题的更好方法吗?
我认为也许正在考虑将我的分段函数视为无类型。打印带有一些随机值的函数只会返回“无”。但是,我的分段函数可以打印相同的内容,但仍然可以正常工作。