csv数据库的Keras Index out of Bounds Error

时间:2017-07-10 15:15:59

标签: python machine-learning keras

这是我第一次在使用该网站一段时间后在StackOverflow上发帖。

我一直在尝试从此链接预测练习机器学习数据库的最后一列 http://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008#

我运行下面的代码并收到此错误:

追踪(最近一次呼叫最后一次):

文件"",第1行,in     runfile(' / Users / ric4711 / diabetes_tensorflow',wdir =' / Users / ric4711')

文件" /Users/ric4711/anaconda/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py",第880行,在runfile中     execfile(filename,namespace)

File" /Users/ric4711/anaconda/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py" ;,第94行,在execfile中     builtins.execfile(filename,* where)

文件" / Users / ric4711 / diabetes_tensorflow",第60行,in     y_train = to_categorical(y_train,num_classes = num_classes)

File" /Users/ric4711/anaconda/lib/python2.7/site-packages/keras/utils/np_utils.py" ;,第25行,在to_categorical     分类[np.arange(n),y] = 1

IndexError:索引3超出了轴1的大小为3的范围

我怀疑我的y轴维度可能存在问题,或者我是如何为此管理类别的。任何帮助将不胜感激。



from pandas import read_csv
import numpy
from sklearn.model_selection import train_test_split
from keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from keras.layers import Dense, Input
from keras.models import Model

dataset = read_csv(r"/Users/ric4711/Documents/dataset_diabetes/diabetic_data.csv", header=None)
#Column 2, 5, 10, 11, 18, 19, 20 all have "?" 
#(101767, 50) size of dataset
#PROBLEM COLUMNS WITH NUMBER OF "?"
#2      2273
#5     98569
#10    40256
#11    49949
#18       21
#19      358
#20     1423
le=LabelEncoder()

dataset[[2,5,10,11,18,19,20]] = dataset[[2,5,10,11,18,19,20]].replace("?", numpy.NaN)

dataset = dataset.drop(dataset.columns[[0, 1, 5, 10, 11]], axis=1)
dataset.dropna(inplace=True)


y = dataset[[49]]
X = dataset.drop(dataset.columns[[44]], 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

for col in X_test.columns.values:
    if X_test[col].dtypes=='object':
        data=X_train[col].append(X_test[col])
        le.fit(data.values)
        X_train[col]=le.transform(X_train[col])
        X_test[col]=le.transform(X_test[col])
        
for col in y_test.columns.values:
    if y_test[col].dtypes=='object':
        data=y_train[col].append(y_test[col])
        le.fit(data.values)
        y_train[col]=le.transform(y_train[col])
        y_test[col]=le.transform(y_test[col])
        

batch_size = 500
num_epochs = 300
hidden_size = 250

num_test = X_test.shape[0]
num_training = X_train.shape[0]
height, width, depth = 1, X_train.shape[1], 1
num_classes = 3

y_train = y_train.as_matrix()
y_test = y_test.as_matrix()

y_train = to_categorical(y_train, num_classes = num_classes)
y_test = to_categorical(y_test, num_classes = num_classes)

inp = Input(shape=(height * width,))
hidden_1 = Dense(hidden_size, activation='tanh')(inp) 
hidden_2 = Dense(hidden_size, activation='tanh')(hidden_1)
hidden_3 = Dense(hidden_size, activation='tanh')(hidden_2)
hidden_4 = Dense(hidden_size, activation='tanh')(hidden_3)
hidden_5 = Dense(hidden_size, activation='tanh')(hidden_4)
hidden_6 = Dense(hidden_size, activation='tanh')(hidden_5)
hidden_7 = Dense(hidden_size, activation='tanh')(hidden_6)
hidden_8 = Dense(hidden_size, activation='tanh')(hidden_7)
hidden_9 = Dense(hidden_size, activation='tanh')(hidden_8)
hidden_10 = Dense(hidden_size, activation='tanh')(hidden_9)
hidden_11 = Dense(hidden_size, activation='tanh')(hidden_10)
out = Dense(num_classes, activation='softmax')(hidden_11) 


model = Model(inputs=inp, outputs=out) 

model.compile(loss='categorical_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy']) 


model.fit(X_train, y_train, batch_size = batch_size,epochs = num_epochs, validation_split = 0.1, shuffle = True)

model.evaluate(X_test, y_test, verbose=1) 




1 个答案:

答案 0 :(得分:2)

我通过将num_classes更改为4来修复此问题,并在.fit方法中应用numpy.array(X_train),numpy.array(y_train)