我是python statsmodels包的新手。我尝试模拟与log(x)线性相关的一些数据,并使用statsmodels公式接口运行简单的线性回归。以下是代码:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
B0 = 3
B1 = 0.5
x = np.linspace(10, 1e4, num = 1000)
epsilon = np.random.normal(0,3, size=1000)
y=B0 + B1*np.log(x)+epsilon
df1 = pd.DataFrame({'Y':y, 'X':x})
model = smf.OLS ('Y~np.log(X)', data=df1).fit()
我收到以下错误:
ValueError Traceback (most recent call last)
<ipython-input-34-c0ab32ca2acf> in <module>()
7 y=B0 + B1*np.log(X)+epsilon
8 df1 = pd.DataFrame({'Y':y, 'X':X})
----> 9 smf.OLS ('Y~np.log(X)', data=df1)
/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/regression/linear_model.py in __init__(self, endog, exog, missing, hasconst, **kwargs)
689 **kwargs):
690 super(OLS, self).__init__(endog, exog, missing=missing,
--> 691 hasconst=hasconst, **kwargs)
692 if "weights" in self._init_keys:
693 self._init_keys.remove("weights")
/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/regression/linear_model.py in __init__(self, endog, exog, weights, missing, hasconst, **kwargs)
584 weights = weights.squeeze()
585 super(WLS, self).__init__(endog, exog, missing=missing,
--> 586 weights=weights, hasconst=hasconst, **kwargs)
587 nobs = self.exog.shape[0]
588 weights = self.weights
/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/regression/linear_model.py in __init__(self, endog, exog, **kwargs)
89 """
90 def __init__(self, endog, exog, **kwargs):
---> 91 super(RegressionModel, self).__init__(endog, exog, **kwargs)
92 self._data_attr.extend(['pinv_wexog', 'wendog', 'wexog', 'weights'])
93
/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/model.py in __init__(self, endog, exog, **kwargs)
184
185 def __init__(self, endog, exog=None, **kwargs):
--> 186 super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
187 self.initialize()
188
/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/model.py in __init__(self, endog, exog, **kwargs)
58 hasconst = kwargs.pop('hasconst', None)
59 self.data = self._handle_data(endog, exog, missing, hasconst,
---> 60 **kwargs)
61 self.k_constant = self.data.k_constant
62 self.exog = self.data.exog
/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/model.py in _handle_data(self, endog, exog, missing, hasconst, **kwargs)
82
83 def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
---> 84 data = handle_data(endog, exog, missing, hasconst, **kwargs)
85 # kwargs arrays could have changed, easier to just attach here
86 for key in kwargs:
/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/data.py in handle_data(endog, exog, missing, hasconst, **kwargs)
562 exog = np.asarray(exog)
563
--> 564 klass = handle_data_class_factory(endog, exog)
565 return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
566 **kwargs)
/Users/tiger/anaconda/lib/python3.5/site-packages/statsmodels/base/data.py in handle_data_class_factory(endog, exog)
551 else:
552 raise ValueError('unrecognized data structures: %s / %s' %
--> 553 (type(endog), type(exog)))
554 return klass
555
ValueError: unrecognized data structures: <class 'str'> / <class 'NoneType'>
我检查了文件,一切似乎都是正确的。花了很长时间试图理解为什么我得到这些错误,但无法弄清楚。非常感谢帮助。
答案 0 :(得分:6)
在statsmodels.formula.api中,ols方法是小写的。 在statsmodels.api中,OLS全部为大写。 在你的情况下,你需要......
model = smf.ols('Y~np.log(X)', data=df1).fit()