使用差分进化的python中的分段回归

时间:2019-06-11 16:19:44

标签: python linear-regression piecewise differential-evolution

我的长期目标是创建一个模块,该模块针对特定数据集,将分段回归拟合到任意数量的断点,以及标准多项式和线性曲线拟合,然后评估哪个拟合最适合数据(可能使用AIC或BIC)。

我有一个函数,该函数使用微分进化,在假设1个断点的情况下对x和y数据集使用分段回归:

def segReg_one(xData,yData):

    def func(xVals,model_break,slopeA,slopeB,offsetA,offsetB): #Initialization of the piecewise function
        returnArray=[]
        for x in xVals:
            if x > model_break:
                returnArray.append(slopeA * x + offsetA)
            else:
                returnArray.append(slopeB * x + offsetB)


        return returnArray

    def sumSquaredError(parametersTuple): #Definition of an error function to minimize
        modely=func(xData,*parametersTuple)
        warnings.filterwarnings("ignore") # Ignore warnings by genetic algorithm

        return np.sum((yData-modely)**2.0)

    def generate_genetic_Parameters():
        initial_parameters=[]
        x_max=np.max(xData)
        x_min=np.min(xData)
        y_max=np.max(yData)
        y_min=np.min(yData)
        slope=10*(y_max-y_min)/(x_max-x_min)

        initial_parameters.append([x_max,x_min]) #Bounds for model break point
        initial_parameters.append([-slope,slope]) #Bounds for slopeA
        initial_parameters.append([-slope,slope]) #Bounds for slopeB
        initial_parameters.append([y_max,y_min]) #Bounds for offset A
        initial_parameters.append([y_max,y_min]) #Bounds for offset B

        result=differential_evolution(sumSquaredError,initial_parameters,seed=3)

        return result.x

    geneticParameters = generate_genetic_Parameters() #Generates genetic parameters



    fittedParameters, pcov= curve_fit(func, xData, yData, geneticParameters) #Fits the data 
    print('Parameters:', fittedParameters)
    print('Model break at: ', fittedParameters[0])
    print('Slope of line where x < model break: ', fittedParameters[1])
    print('Slope of line where x > model break: ', fittedParameters[2])
    print('Offset of line where x < model break: ', fittedParameters[3])
    print('Offset of line where x > model break: ', fittedParameters[4])





    model=func(xData,*fittedParameters)

    absError = model - yData

    SE = np.square(absError) 
    MSE = np.mean(SE) 
    RMSE = np.sqrt(MSE) 
    Rsquared = 1.0 - (np.var(absError) / np.var(yData))

    print()
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)



    def ModelAndScatterPlot(graphWidth, graphHeight):
            f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
            axes = f.add_subplot(111)


            axes.plot(xData, yData,  'D')

            xModel = np.linspace(min(xData), max(xData))
            yModel = func(xModel, *fittedParameters)

            axes.plot(xModel, yModel)

            axes.set_xlabel('X Data') # X axis data label
            axes.set_ylabel('Y Data') # Y axis data label

            plt.show()
            plt.close('all') 


    graphWidth = 800
    graphHeight = 600

    return ModelAndScatterPlot(800,600)

哪个运行正常。但是,我尝试扩展模型以允许超过1个断点:

def segReg_two(xData,yData):

    def func(xData,break1,break2,slope1,slope_mid,slope2,offset1,offset_mid,offset2):
        returnArray=[]
        for x in xData:
            if x < break1:
                returnArray.append(slope1 * x + offset1)

            if (x < break2 and x > break1):
                returnArray.append(slope_mid * x + offset_mid)

            else:
                returnArray.append(slope2 * x + offset2)


    def sumSquaredError(parametersTuple): #Definition of an error function to minimize
        modely=func(xData,*parametersTuple)
        warnings.filterwarnings("ignore") # Ignore warnings by genetic algorithm

        return np.sum((yData-modely)**2.0)

    def generate_genetic_Parameters():
        initial_parameters=[]
        x_max=np.max(xData)
        x_min=np.min(xData)
        y_max=np.max(yData)
        y_min=np.min(yData)
        slope=10*(y_max-y_min)/(x_max-x_min)

        initial_parameters.append([x_max,x_min]) #Bounds for model break point
        initial_parameters.append([x_max,x_min])
        initial_parameters.append([-slope,slope]) 
        initial_parameters.append([-slope,slope]) 
        initial_parameters.append([-slope,slope]) 
        initial_parameters.append([y_max,y_min])
        initial_parameters.append([y_max,y_min]) 
        initial_parameters.append([y_max,y_min]) 


        result=differential_evolution(sumSquaredError,initial_parameters,seed=3)

        return result.x

    geneticParameters = generate_genetic_Parameters() #Generates genetic parameters



    fittedParameters, pcov= curve_fit(func, xData, yData, geneticParameters) #Fits the data 
    print('Parameters:', fittedParameters)
    print('Model break at: ', fittedParameters[0])
    print('Slope of line where x < model break: ', fittedParameters[1])
    print('Slope of line where x > model break: ', fittedParameters[2])
    print('Offset of line where x < model break: ', fittedParameters[3])
    print('Offset of line where x > model break: ', fittedParameters[4])





    model=func(xData,*fittedParameters)

    absError = model - yData

    SE = np.square(absError) 
    MSE = np.mean(SE) 
    RMSE = np.sqrt(MSE) 
    Rsquared = 1.0 - (np.var(absError) / np.var(yData))

    print()
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)

    def ModelAndScatterPlot(graphWidth, graphHeight):
            f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
            axes = f.add_subplot(111)


            axes.plot(xData, yData,  'D')

            xModel = np.linspace(min(xData), max(xData))
            yModel = func(xModel, *fittedParameters)

            axes.plot(xModel, yModel)

            axes.set_xlabel('X Data') # X axis data label
            axes.set_ylabel('Y Data') # Y axis data label

            plt.show()
            plt.close('all') 


    graphWidth = 800
    graphHeight = 600

    return ModelAndScatterPlot(800,600)

当我运行segReg_two(x,y)并停在differential_evolution位时,此代码会出现问题:

  

TypeError:--'float'和'NoneType'的不受支持的操作数类型   在处理上述异常期间,发生了另一个异常:

     

RuntimeError:类似于地图的可调用对象必须采用f(func,iterable)的形式,并返回与'iterable'相同长度的数字序列。

我在segReg_one上没有遇到这个问题,所以我不明白为什么在这里发生它。我假设(并且我可能对此假设不正确),参数iterable必须具有与我的错误函数兼容的尺寸。但是,除了我正在找到断点,斜率和偏移量以在给定边界的情况下将断点最小化的事实之外,我不确定这两个参数是如何精确关联的。

此外,我的进攻计划似乎是漫长而残酷的。有解决这个问题的更好方法吗?

我认为也许正在考虑将我的分段函数视为无类型。打印带有一些随机值的函数只会返回“无”。但是,我的分段函数可以打印相同的内容,但仍然可以正常工作。

0 个答案:

没有答案