statsmodel:模拟数据并运行简单的线性回归

时间:2017-02-02 14:39:17

标签: python statsmodels

我是python statsmodels包的新手。我尝试模拟与log(x)线性相关的一些数据,并使用statsmodels公式接口运行简单的线性回归。以下是代码:

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

B0 = 3
B1 = 0.5
x = np.linspace(10, 1e4, num = 1000)
epsilon = np.random.normal(0,3, size=1000)

y=B0 + B1*np.log(x)+epsilon
df1 = pd.DataFrame({'Y':y, 'X':x})

model = smf.OLS ('Y~np.log(X)', data=df1).fit()

我收到以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-34-c0ab32ca2acf> in <module>()
      7 y=B0 + B1*np.log(X)+epsilon
      8 df1 = pd.DataFrame({'Y':y, 'X':X})
----> 9 smf.OLS ('Y~np.log(X)', data=df1)

/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/regression/linear_model.py in __init__(self, endog, exog, missing, hasconst, **kwargs)
    689                  **kwargs):
    690         super(OLS, self).__init__(endog, exog, missing=missing,
--> 691                                   hasconst=hasconst, **kwargs)
    692         if "weights" in self._init_keys:
    693             self._init_keys.remove("weights")

/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/regression/linear_model.py in __init__(self, endog, exog, weights, missing, hasconst, **kwargs)
    584             weights = weights.squeeze()
    585         super(WLS, self).__init__(endog, exog, missing=missing,
--> 586                                   weights=weights, hasconst=hasconst, **kwargs)
    587         nobs = self.exog.shape[0]
    588         weights = self.weights

/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/regression/linear_model.py in __init__(self, endog, exog, **kwargs)
     89     """
     90     def __init__(self, endog, exog, **kwargs):
---> 91         super(RegressionModel, self).__init__(endog, exog, **kwargs)
     92         self._data_attr.extend(['pinv_wexog', 'wendog', 'wexog', 'weights'])
     93 

/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/model.py in __init__(self, endog, exog, **kwargs)
    184 
    185     def __init__(self, endog, exog=None, **kwargs):
--> 186         super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
    187         self.initialize()
    188 

/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/model.py in __init__(self, endog, exog, **kwargs)
     58         hasconst = kwargs.pop('hasconst', None)
     59         self.data = self._handle_data(endog, exog, missing, hasconst,
---> 60                                       **kwargs)
     61         self.k_constant = self.data.k_constant
     62         self.exog = self.data.exog

/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/model.py in _handle_data(self, endog, exog, missing, hasconst, **kwargs)
     82 
     83     def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
---> 84         data = handle_data(endog, exog, missing, hasconst, **kwargs)
     85         # kwargs arrays could have changed, easier to just attach here
     86         for key in kwargs:

/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/data.py in handle_data(endog, exog, missing, hasconst, **kwargs)
    562         exog = np.asarray(exog)
    563 
--> 564     klass = handle_data_class_factory(endog, exog)
    565     return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
    566                  **kwargs)

/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/data.py in handle_data_class_factory(endog, exog)
    551     else:
    552         raise ValueError('unrecognized data structures: %s / %s' %
--> 553                          (type(endog), type(exog)))
    554     return klass
    555 

ValueError: unrecognized data structures: <class 'str'> / <class 'NoneType'>

我检查了文件,一切似乎都是正确的。花了很长时间试图理解为什么我得到这些错误,但无法弄清楚。非常感谢帮助。

1 个答案:

答案 0 :(得分:6)

在statsmodels.formula.api中,ols方法是小写的。 在statsmodels.api中,OLS全部为大写。 在你的情况下,你需要......

model = smf.ols('Y~np.log(X)', data=df1).fit()