我正在尝试规范化和训练我的数据集。但是,我不断收到此错误,但我不知道它的原因。我正在试验不同的预处理类型和模型,以了解哪种方法最适合数据集。 df_norm的类型为“ numpy.float64”,并且我读到将其转换为int可以工作,但是方法存在问题。问题是值是浮点的吗?如果是这样,是否有一种切实可行的方法来解决此问题而不使其“有问题”?提前致谢。
代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing, linear_model, svm
from sklearn.model_selection import train_test_split
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
raw_df = pd.read_csv('parkinsons_updrs.data.txt', index_col=False)
#Check for missing data
#print(pd.isnull(raw_df).sum())
#Grouping the patients by subject #
df_mean = pd.DataFrame()
group = raw_df.groupby('subject#')
for patient, medical_data in group:
#print(patient)
#print(medical_data)
df_mean = df_mean.append(medical_data.agg(np.mean), ignore_index=True)
df_mean.set_index('subject#', inplace=True)
#df_mean.to_html('Parkinsons Patients Mean Data.html')
#Data Scaling
#Normalization
df_norm = preprocessing.normalize(df_mean)
cols = df_mean.columns.values
df_norm = pd.DataFrame(df_norm, columns=cols)
labels_norm = df_norm.pop('total_UPDRS')
#Label Encoding
df_le = pd.DataFrame()
le = preprocessing.LabelEncoder()
for col in df_mean.columns.values:
le.fit(df_mean[col])
df_le[col] = le.transform(df_mean[col])
labels_le = df_le.pop('total_UPDRS')
#Split the data
x_train, x_test, y_train, y_test = train_test_split(df_norm, labels_norm, test_size=0.2, random_state=0)
#Make the Model - Logistic Regression
log_regr = linear_model.LogisticRegression()
log_regr.fit(x_train, y_train)
#Predict
y_pred_norm = log_regr.predict(x_test)
correct = 0
for i in range(len(y_pred_norm)):
if y_pred_norm[i] == y_test.iloc[i]:
correct += 1
print('Normalized Accuracy: ', correct / len(y_pred_norm))
错误:
Warning (from warnings module):
File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\linear_model\logistic.py", line 433
FutureWarning)
FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
Traceback (most recent call last):
File "C:/Users/andre/AppData/Local/Programs/Python/Python37/Machine Learning/Parkinsons Telemonitoring Data/Parkinsons Telemonitoring Data - Attempt 2.py", line 112, in <module>
log_regr.fit(x_train, y_train)
File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\linear_model\logistic.py", line 1289, in fit
check_classification_targets(y)
File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\utils\multiclass.py", line 171, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'