训练数据集-ValueError:未知标签类型:“连续”

时间:2019-03-14 16:55:27

标签: python-3.x pandas sklearn-pandas

我正在尝试规范化和训练我的数据集。但是,我不断收到此错误,但我不知道它的原因。我正在试验不同的预处理类型和模型,以了解哪种方法最适合数据集。 df_norm的类型为“ numpy.float64”,并且我读到将其转换为int可以工作,但是方法存在问题。问题是值是浮点的吗?如果是这样,是否有一种切实可行的方法来解决此问题而不使其“有问题”?提前致谢。

代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn import preprocessing, linear_model, svm
from sklearn.model_selection import train_test_split

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500) 
pd.set_option('display.width', 1000)

raw_df = pd.read_csv('parkinsons_updrs.data.txt', index_col=False)

#Check for missing data
#print(pd.isnull(raw_df).sum())

#Grouping the patients by subject #

df_mean = pd.DataFrame()

group = raw_df.groupby('subject#')

for patient, medical_data in group:
    #print(patient)
    #print(medical_data)
    df_mean = df_mean.append(medical_data.agg(np.mean), ignore_index=True)

df_mean.set_index('subject#', inplace=True)
#df_mean.to_html('Parkinsons Patients Mean Data.html')


#Data Scaling

#Normalization
df_norm = preprocessing.normalize(df_mean)
cols = df_mean.columns.values
df_norm = pd.DataFrame(df_norm, columns=cols)
labels_norm = df_norm.pop('total_UPDRS')

#Label Encoding
df_le = pd.DataFrame()
le = preprocessing.LabelEncoder()
for col in df_mean.columns.values:
    le.fit(df_mean[col])
    df_le[col] = le.transform(df_mean[col])
labels_le = df_le.pop('total_UPDRS')

#Split the data
x_train, x_test, y_train, y_test = train_test_split(df_norm, labels_norm, test_size=0.2, random_state=0)


#Make the Model - Logistic Regression
log_regr = linear_model.LogisticRegression()
log_regr.fit(x_train, y_train)

#Predict
y_pred_norm = log_regr.predict(x_test)

correct = 0
for i in range(len(y_pred_norm)):
    if y_pred_norm[i] == y_test.iloc[i]:
        correct += 1

print('Normalized Accuracy: ', correct / len(y_pred_norm))

错误:

Warning (from warnings module):
  File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\linear_model\logistic.py", line 433
    FutureWarning)
FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
Traceback (most recent call last):
  File "C:/Users/andre/AppData/Local/Programs/Python/Python37/Machine Learning/Parkinsons Telemonitoring Data/Parkinsons Telemonitoring Data - Attempt 2.py", line 112, in <module>
    log_regr.fit(x_train, y_train)
  File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\linear_model\logistic.py", line 1289, in fit
    check_classification_targets(y)
  File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\utils\multiclass.py", line 171, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'

0 个答案:

没有答案