这是我第一次在使用该网站一段时间后在StackOverflow上发帖。
我一直在尝试从此链接预测练习机器学习数据库的最后一列 http://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008#
我运行下面的代码并收到此错误:
追踪(最近一次呼叫最后一次):
文件"",第1行,in runfile(' / Users / ric4711 / diabetes_tensorflow',wdir =' / Users / ric4711')
文件" /Users/ric4711/anaconda/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py",第880行,在runfile中 execfile(filename,namespace)
File" /Users/ric4711/anaconda/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py" ;,第94行,在execfile中 builtins.execfile(filename,* where)
文件" / Users / ric4711 / diabetes_tensorflow",第60行,in y_train = to_categorical(y_train,num_classes = num_classes)
File" /Users/ric4711/anaconda/lib/python2.7/site-packages/keras/utils/np_utils.py" ;,第25行,在to_categorical 分类[np.arange(n),y] = 1
IndexError:索引3超出了轴1的大小为3的范围
我怀疑我的y轴维度可能存在问题,或者我是如何为此管理类别的。任何帮助将不胜感激。
from pandas import read_csv
import numpy
from sklearn.model_selection import train_test_split
from keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from keras.layers import Dense, Input
from keras.models import Model
dataset = read_csv(r"/Users/ric4711/Documents/dataset_diabetes/diabetic_data.csv", header=None)
#Column 2, 5, 10, 11, 18, 19, 20 all have "?"
#(101767, 50) size of dataset
#PROBLEM COLUMNS WITH NUMBER OF "?"
#2 2273
#5 98569
#10 40256
#11 49949
#18 21
#19 358
#20 1423
le=LabelEncoder()
dataset[[2,5,10,11,18,19,20]] = dataset[[2,5,10,11,18,19,20]].replace("?", numpy.NaN)
dataset = dataset.drop(dataset.columns[[0, 1, 5, 10, 11]], axis=1)
dataset.dropna(inplace=True)
y = dataset[[49]]
X = dataset.drop(dataset.columns[[44]], 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
for col in X_test.columns.values:
if X_test[col].dtypes=='object':
data=X_train[col].append(X_test[col])
le.fit(data.values)
X_train[col]=le.transform(X_train[col])
X_test[col]=le.transform(X_test[col])
for col in y_test.columns.values:
if y_test[col].dtypes=='object':
data=y_train[col].append(y_test[col])
le.fit(data.values)
y_train[col]=le.transform(y_train[col])
y_test[col]=le.transform(y_test[col])
batch_size = 500
num_epochs = 300
hidden_size = 250
num_test = X_test.shape[0]
num_training = X_train.shape[0]
height, width, depth = 1, X_train.shape[1], 1
num_classes = 3
y_train = y_train.as_matrix()
y_test = y_test.as_matrix()
y_train = to_categorical(y_train, num_classes = num_classes)
y_test = to_categorical(y_test, num_classes = num_classes)
inp = Input(shape=(height * width,))
hidden_1 = Dense(hidden_size, activation='tanh')(inp)
hidden_2 = Dense(hidden_size, activation='tanh')(hidden_1)
hidden_3 = Dense(hidden_size, activation='tanh')(hidden_2)
hidden_4 = Dense(hidden_size, activation='tanh')(hidden_3)
hidden_5 = Dense(hidden_size, activation='tanh')(hidden_4)
hidden_6 = Dense(hidden_size, activation='tanh')(hidden_5)
hidden_7 = Dense(hidden_size, activation='tanh')(hidden_6)
hidden_8 = Dense(hidden_size, activation='tanh')(hidden_7)
hidden_9 = Dense(hidden_size, activation='tanh')(hidden_8)
hidden_10 = Dense(hidden_size, activation='tanh')(hidden_9)
hidden_11 = Dense(hidden_size, activation='tanh')(hidden_10)
out = Dense(num_classes, activation='softmax')(hidden_11)
model = Model(inputs=inp, outputs=out)
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size = batch_size,epochs = num_epochs, validation_split = 0.1, shuffle = True)
model.evaluate(X_test, y_test, verbose=1)

答案 0 :(得分:2)
我通过将num_classes更改为4来修复此问题,并在.fit方法中应用numpy.array(X_train),numpy.array(y_train)