我在数据集上运行Logistic回归分类器,如下所示:
ID| feature1 | feature2 | feature3 | Match
0 | 6 | 9 | 9.5 | 1
1 | 9 | 7 | 3.9 | 0
2 | 7 | 3 | 5.8 | 1
我的模型是y(match) = f(feature1, feature2, feature3)
,其中y是二进制变量。我在python中运行以下代码:
df = pd.read_csv('abc.csv', encoding = 'latin-1')
X = pd.DataFrame()
X['match'] = df ['match']
X['feature1'] = df ['feature1']
X['feature2'] = df ['feature2']
X['feature3'] = df ['feature3']
X = X.dropna(axis=0) # Drop NAs
y = X['match'].to_frame() # Categorical variable Match [Yes, No]
y = np.ravel(y) # Converting into 1-D array
X = X.drop(['match'], axis=1) # Drop y from X
X = X.as_matrix() # converting dataframe to numpy matrix
# Splitting into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Applying logistic regression using sklearn
model_1 = LogisticRegression(penalty='l2', C=1)
model_1.fit(X_train, y_train)
model_1.predict(X_test)
上面的代码为model_1.predict(X_test)返回[0,0,0 ...,0,0,0]。我在很多地方检查过,但我发现我的代码没有错。它也会运行但会产生意想不到的结果请帮忙。