我正在尝试使用statsmodels执行LogisticRegression,我遇到了这个问题。我一直在寻找它,但找不到如何解决它,我不确定我的预处理的哪个部分导致了这个问题。
这是我的代码:
df['Class'].replace('benign',2, inplace=True)
df['Class'].replace('malignant',4, inplace=True)
df.replace('?',10**9,inplace=True)
df.isnull().values.any()#this gives false, data has no missing value
X = df.drop(['Class','Code'], 1)
X = pd.DataFrame({"Constant":np.ones(len(X))}).join(pd.DataFrame(X))
y = df['Class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train.astype(int)
y_train.astype(int)
X_train = np.array(X_train)
y_train = np.array(y_train)
这里提出错误:
clf = sm.Logit(y_train, X_train)
这就是消息:
Traceback (most recent call last):
File "<ipython-input-204-dfe5f93d6f95>", line 1, in <module>
clf = sm.Logit(y_train, X_train)
File "C:\Users\DELL\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 402, in __init__
super(BinaryModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\DELL\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 155, in __init__
super(DiscreteModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\DELL\Anaconda3\lib\site-packages\statsmodels\base\model.py", line 212, in __init__
super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\DELL\Anaconda3\lib\site-packages\statsmodels\base\model.py", line 63, in __init__
**kwargs)
File "C:\Users\DELL\Anaconda3\lib\site-packages\statsmodels\base\model.py", line 88, in _handle_data
data = handle_data(endog, exog, missing, hasconst, **kwargs)
File "C:\Users\DELL\Anaconda3\lib\site-packages\statsmodels\base\data.py", line 630, in handle_data
**kwargs)
File "C:\Users\DELL\Anaconda3\lib\site-packages\statsmodels\base\data.py", line 79, in __init__
self._handle_constant(hasconst)
File "C:\Users\DELL\Anaconda3\lib\site-packages\statsmodels\base\data.py", line 131, in _handle_constant
const_idx = np.where(self.exog.ptp(axis=0) == 0)[0].squeeze()
TypeError: '>=' not supported between instances of 'str' and 'int'
我是机器学习的新手。