多项式分析

时间:2019-10-23 12:56:15

标签: python regression non-linear-regression

我正在尝试分析具有多个回归(超过100个因子)的非常大的数据集。 绝大多数是虚拟变量/可以使用线性回归建模

但是,遵循三阶多项式,有没有一种方法可以轻松地在python中计算呢?我在Anaconda(v3.7)中使用了spyder。

我尝试了multipolyfit函数(对来自另一个线程的伪数据,链接如下),但无法正常工作,这是我的代码和错误。 Multivariate (polynomial) best fit curve in python?


from numpy import linalg, zeros, ones, hstack, asarray
import itertools

def basis_vector(n, i):
    """ Return an array like [0, 0, ..., 1, ..., 0, 0]
    >>> from multipolyfit.core import basis_vector
    >>> basis_vector(3, 1)
    array([0, 1, 0])
    >>> basis_vector(5, 4)
    array([0, 0, 0, 0, 1])
    """
    x = zeros(n, dtype=int)
    x[i] = 1
    return x

def as_tall(x):
    """ Turns a row vector into a column vector """
    return x.reshape(x.shape + (1,))

def multipolyfit(xs, y, deg, full=False, model_out=False, powers_out=False):
    """
    Least squares multivariate polynomial fit
    Fit a polynomial like ``y = a**2 + 3a - 2ab + 4b**2 - 1``
    with many covariates a, b, c, ...
    Parameters
    ----------
    xs : array_like, shape (M, k)
         x-coordinates of the k covariates over the M sample points
    y :  array_like, shape(M,)
         y-coordinates of the sample points.
    deg : int
         Degree o fthe fitting polynomial
    model_out : bool (defaults to True)
         If True return a callable function
         If False return an array of coefficients
    powers_out : bool (defaults to False)
         Returns the meaning of each of the coefficients in the form of an
         iterator that gives the powers over the inputs and 1
         For example if xs corresponds to the covariates a,b,c then the array
         [1, 2, 1, 0] corresponds to 1**1 * a**2 * b**1 * c**0
    See Also
    --------
        numpy.polyfit
    """
    y = asarray(y).squeeze()
    rows = y.shape[0]
    xs = asarray(xs)
    num_covariates = xs.shape[1]
    xs = hstack((ones((xs.shape[0], 1), dtype=xs.dtype) , xs))

    generators = [basis_vector(num_covariates+1, i)
                            for i in range(num_covariates+1)]

    # All combinations of degrees
    powers = map(sum, itertools.combinations_with_replacement(generators, deg))

    # Raise data to specified degree pattern, stack in order
    A = hstack(asarray([as_tall((xs**p).prod(1)) for p in powers]))

    beta = linalg.lstsq(A, y)[0]

    if model_out:
        return mk_model(beta, powers)

    if powers_out:
        return beta, powers
    return beta

def mk_model(beta, powers):
    """ Create a callable python function out of beta/powers from multipolyfit
    This function is callable from within multipolyfit using the model_out flag
    """
    # Create a function that takes in many x values
    # and returns an approximate y value
    def model(*args):
        num_covariates = len(powers[0]) - 1
        if len(args)!=(num_covariates):
            raise ValueError("Expected %d inputs"%num_covariates)
        xs = asarray((1,) + args)
        return sum([coeff * (xs**p).prod()
                             for p, coeff in zip(powers, beta)])
    return model

def mk_sympy_function(beta, powers):
    from sympy import symbols, Add, Mul, S
    num_covariates = len(powers[0]) - 1
    xs = (S.One,) + symbols('x0:%d'%num_covariates)
    return Add(*[coeff * Mul(*[x**deg for x, deg in zip(xs, power)])
                        for power, coeff in zip(powers, beta)])

错误:

************* Module untitled2
C0326: 49,56: : No space allowed before comma
    xs = hstack((ones((xs.shape[0], 1), dtype=xs.dtype) , xs))

                                                        ^
C0330: 52,0: : Wrong continued indentation (remove 10 spaces).
                            for i in range(num_covariates+1)]

                  |         ^
C0326: 77,20: : Exactly one space required around comparison
        if len(args)!=(num_covariates):

                    ^^
C0330: 81,0: : Wrong continued indentation (remove 9 spaces).
                             for p, coeff in zip(powers, beta)])

                    |        ^
C0330: 89,0: : Wrong continued indentation (remove 7 spaces).
                        for power, coeff in zip(powers, beta)])

                 |      ^
C0303: 94,18: : Trailing whitespace
C0326: 96,10: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

          ^
C0326: 96,13: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

             ^
C0326: 96,16: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                ^
C0326: 96,19: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                   ^
C0326: 96,22: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                      ^
C0326: 96,25: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                         ^
C0326: 96,29: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                             ^
C0326: 96,32: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                                ^
C0326: 96,36: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                                    ^
C0326: 96,39: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                                       ^
C0326: 96,43: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                                           ^
C0326: 96,47: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                                               ^
C0326: 96,51: : Exactly one space required after comma
data = [[1,1],[4,3],[8,3],[11,4],[10,7],[15,11],[16,12]]

                                                   ^
C0326:100,26: : Exactly one space required after comma
stacked_x = numpy.array([x,x+1,x-1])

                          ^
C0326:100,30: : Exactly one space required after comma
stacked_x = numpy.array([x,x+1,x-1])

                              ^
C0303:101,31: : Trailing whitespace
C0304:104,0: : Final newline missing
C0114:  1,0: : Missing module docstring
C0103:  4,0: basis_vector: Argument name "n" doesn't conform to snake_case naming style
W0621: 12,4: basis_vector: Redefining name 'x' from outer scope (line 97)
C0103: 12,4: basis_vector: Variable name "x" doesn't conform to snake_case naming style
C0103: 16,0: as_tall: Argument name "x" doesn't conform to snake_case naming style
W0621: 16,12: as_tall: Redefining name 'x' from outer scope (line 97)
C0103: 20,0: multipolyfit: Argument name "xs" doesn't conform to snake_case naming style
C0103: 20,0: multipolyfit: Argument name "y" doesn't conform to snake_case naming style
W0621: 20,21: multipolyfit: Redefining name 'y' from outer scope (line 97)
R0913: 20,0: multipolyfit: Too many arguments (6/5)
C0103: 58,4: multipolyfit: Variable name "A" doesn't conform to snake_case naming style
W0613: 20,29: multipolyfit: Unused argument 'full'
W0612: 46,4: multipolyfit: Unused variable 'rows'
C0103: 79,8: mk_model.model: Variable name "xs" doesn't conform to snake_case naming style
C0116: 84,0: mk_sympy_function: Missing function or method docstring
C0103: 87,4: mk_sympy_function: Variable name "xs" doesn't conform to snake_case naming style
C0413: 92,0: : Import "import numpy" should be placed at the top of the module
C0413: 93,0: : Import "import matplotlib.pyplot as plt" should be placed at the top of the module
C0103: 94,0: : Constant name "mpf" doesn't conform to UPPER_CASE naming style
C0103: 96,0: : Constant name "data" doesn't conform to UPPER_CASE naming style
C0103: 97,0: : Constant name "x" doesn't conform to UPPER_CASE naming style
C0103: 97,3: : Constant name "y" doesn't conform to UPPER_CASE naming style
C0103:100,0: : Constant name "stacked_x" doesn't conform to UPPER_CASE naming style
C0103:101,0: : Constant name "coeffs" doesn't conform to UPPER_CASE naming style
E0602:101,27: : Undefined variable 'deg'
C0103:102,0: : Constant name "x2" doesn't conform to UPPER_CASE naming style
C0103:103,0: : Constant name "y2" doesn't conform to UPPER_CASE naming style
C0411:  2,0: : standard import "import itertools" should be placed before "from numpy import linalg, zeros, ones, hstack, asarray"

------------------------------------

Your code has been rated at -1.70/10'''

0 个答案:

没有答案