我运行了这段代码,但是lr.fit行上似乎有一个错误。有谁知道该怎么做?
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn import linear_model
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('2019.csv')
df1 = pd.DataFrame(df,columns=['GDP per capita', 'Social support'])
lr = LogisticRegression()
columns = ['GDP per capita', 'Social support']
X = df[columns]
y = df["Score"]
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20,random_state=0)
lr.fit(X_train,y_train)
predictions = lr.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(accuracy)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-afa10dbaa367> in <module>
19 X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.30,random_state=0)
20
---> 21 lr.fit(X_train,y_train)
22 predictions = lr.predict(X_test)
23 accuracy = accuracy_score(y_test, predictions)
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py in fit(self, X, y, sample_weight)
1526 X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype, order="C",
1527 accept_large_sparse=solver != 'liblinear')
-> 1528 check_classification_targets(y)
1529 self.classes_ = np.unique(y)
1530 n_samples, n_features = X.shape
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
167 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
168 'multilabel-indicator', 'multilabel-sequences']:
--> 169 raise ValueError("Unknown label type: %r" % y_type)
170
171
ValueError: Unknown label type: 'continuous'
最上面是完整的调试错误,当我在X和y旁边执行.astype(int)时,才使它起作用。否则,如果我不这样做,则会发生您所看到的错误。
答案 0 :(得分:1)
我去了Kaggle,搜索并发现2019.csv有两列。这些数据与世界各国人民的幸福感以及人均GDP与“幸福感评分”有关。很好,为我工作。
无论如何,我编辑了2019.csv,并保留了两个数据列和得分。我有1列=分数,并且必须全为零或零(这非常重要)。我将其他两列重命名为GDP和SS,并删除了所有其他列。
得分,GDP,SS-2019.csv中的列
此代码在Macbook Pro上的PyCharm中运行时产生以下输出:
数字为“准确性”
0.46875
以退出代码0结束的过程
因此,起初并不是那么好(几乎47%的准确率),可以很容易地改进...
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
df = pd.read_csv('2019.csv')
df.head()
x = df.drop('Score', axis=1)
y = df.Score
lr = LogisticRegression()
columns = ['GDP', 'SS']
X = df[columns]
y = df["Score"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
lr.fit(X_train, y_train)
predictions = lr.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(accuracy)
“”“ 这是输出
0.46875
进程完成,退出代码为0 “”“
希望这会有所帮助。