基于流行的皮肤癌图像数据集,我正在开发的模型存在困境。 我必须指出一些我想要的指导-
A。
原始数据集超过10K张图像,其中近7000张图像属于七个类别之一。我创建了一个4948个随机图像的子集,使用该子集,我运行了一个将图像转换为列表列表的功能-第一个列表包含该图像,后一个列表为该类,并关闭所有属于该类的图像(5- + 6800K图片)。思想过程是使各班级的分布正常化。
重新运行带有输出的原始模型(6个神经元而不是7个神经元的密集层)-检索错误。
我是否缺少一个步骤来“指示”模型只有六个可能的类?该模型仅在输出层具有七个神经元时运行。
错误:
Train on 1245 samples, validate on 312 samples
Epoch 1/30
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-138-8a3b40a69e37> in <module>
25 metrics=["accuracy"])
26
---> 27 model.fit(X_train, y_train, batch_size=32, epochs=30, validation_split=0.2)
/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
778 validation_steps=validation_steps,
779 validation_freq=validation_freq,
--> 780 steps_name='steps_per_epoch')
781
782 def evaluate(self,
/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
361
362 # Get outputs.
--> 363 batch_outs = f(ins_batch)
364 if not isinstance(batch_outs, list):
365 batch_outs = [batch_outs]
/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
3290
3291 fetched = self._callable_fn(*array_vals,
-> 3292 run_metadata=self.run_metadata)
3293 self._call_fetch_callbacks(fetched[-len(self._fetches):])
3294 output_structure = nest.pack_sequence_as(
/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
1456 ret = tf_session.TF_SessionRunCallable(self._session._session,
1457 self._handle, args,
-> 1458 run_metadata_ptr)
1459 if run_metadata:
1460 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
InvalidArgumentError: Received a label value of 6 which is outside the valid range of [0, 6). Label values: 1 1 2 4 2 1 2 1 2 1 2 2 4 2 2 1 3 1 4 6 0 2 4 2 0 4 2 4 4 0 2 4
[[{{node loss_15/activation_63_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
B。
我正在尝试添加数据增强,因为考虑到类别的数量以及各个类别中图像的稀疏性,数据集相对较小。尝试运行生成器后,我收到以下错误消息,提示validation_data
元组中的变量之一有问题。我不明白问题是什么。
测试集的示例值如下:
[[[[0.41568627]
[0.4 ]
[0.43137255]
...
[0.54509804]
[0.54901961]
[0.54509804]]
[[0.42352941]
[0.43137255]
[0.43921569]
...
[0.56078431]
[0.54117647]
[0.55294118]]
[[0.41960784]
[0.41960784]
[0.45490196]
...
[0.51764706]
[0.57254902]
[0.50588235]]
...
[[0.30980392]
[0.36470588]
[0.36470588]
...
[0.47058824]
[0.44705882]
[0.41960784]]
[[0.29803922]
[0.31764706]
[0.34509804]
...
[0.45098039]
[0.43921569]
[0.4 ]]
[[0.25882353]
[0.30196078]
[0.31764706]
...
[0.45490196]
[0.42745098]
[0.36078431]]]
[[[0.60784314]
[0.59215686]
[0.56862745]
...
[0.59607843]
[0.63921569]
[0.63529412]]
[[0.6627451 ]
[0.63137255]
[0.62352941]
...
[0.67843137]
[0.60784314]
[0.63529412]]
[[0.62745098]
[0.65098039]
[0.6 ]
...
[0.61568627]
[0.63921569]
[0.67058824]]
...
[[0.62352941]
[0.6 ]
[0.59607843]
...
[0.6627451 ]
[0.71372549]
[0.6745098 ]]
[[0.61568627]
[0.58431373]
[0.61568627]
...
[0.67058824]
[0.65882353]
[0.68235294]]
[[0.61176471]
[0.60392157]
[0.61960784]
...
[0.65490196]
[0.6627451 ]
[0.66666667]]]]
[2, 1, 4, 4, 2]
错误:
Epoch 1/10
1/155 [..............................] - ETA: 11s - loss: 1.7916 - acc: 0.3000
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-139-8f19a958861f> in <module>
12 history = model.fit_generator(trainAug.flow(X_train, y_train, batch_size=batch_size)
13 ,epochs = 10, validation_data = (X_test, y_test),
---> 14 steps_per_epoch= X_train.shape[0]// batch_size
15 )
16 #epochs = epochs, validation_data = (X_test, y_test),
/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1431 shuffle=shuffle,
1432 initial_epoch=initial_epoch,
-> 1433 steps_name='steps_per_epoch')
1434
1435 def evaluate_generator(self,
/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_generator.py in model_iteration(model, data, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size, steps_name, **kwargs)
262
263 is_deferred = not model._is_compiled
--> 264 batch_outs = batch_function(*batch_data)
265 if not isinstance(batch_outs, list):
266 batch_outs = [batch_outs]
/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
1173 self._update_sample_weight_modes(sample_weights=sample_weights)
1174 self._make_train_function()
-> 1175 outputs = self.train_function(ins) # pylint: disable=not-callable
1176
1177 if reset_metrics:
/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
3290
3291 fetched = self._callable_fn(*array_vals,
-> 3292 run_metadata=self.run_metadata)
3293 self._call_fetch_callbacks(fetched[-len(self._fetches):])
3294 output_structure = nest.pack_sequence_as(
/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
1456 ret = tf_session.TF_SessionRunCallable(self._session._session,
1457 self._handle, args,
-> 1458 run_metadata_ptr)
1459 if run_metadata:
1460 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
InvalidArgumentError: Received a label value of 6 which is outside the valid range of [0, 6). Label values: 0 1 6 4 2 4 2 0 1 2
[[{{node loss_15/activation_63_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
代码:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
import os
import cv2
DATA_DIR = "/Users/namefolder/PycharmProjects/skin-cancer/HAM10000_images_part_1"
metadata = pd.read_csv(os.path.join(DATA_DIR, 'HAM10000_metadata.csv'))
lesion_type_dict = {'nv': 'Melanocytic nevi',
'mel': 'Melanoma',
'bkl': 'Benign keratosis-like lesions ',
'bcc': 'Basal cell carcinoma',
'akiec': 'Actinic keratoses',
'vasc': 'Vascular lesions',
'df': 'Dermatofibroma'}
metadata['cell_type'] = metadata['dx'].map(lesion_type_dict.get)
metadata['dx_code'] = pd.Categorical(metadata['dx']).codes
# save array of image-id and diagnosis-type (categorical)
metadata = metadata[['image_id', 'dx', 'dx_type', 'dx_code']]
training_data = []
IMG_SIZE=50
# preparing training data
def creating_training_data(path):
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
for index, row in metadata.iterrows():
if (img == row['image_id']+'.jpg') & (row['dx_code'] != 5):
try:
training_data.append([new_array, row['dx_code']])
except Exception as ee:
pass
except Exception as e:
pass
return training_data
training_data = creating_training_data(DATA_DIR)
import random
random.shuffle(training_data)
# Splitting data into X features and Y label
X_train = []
y_train = []
for features, label in training_data:
X_train.append(features)
y_train.append(label)
# Reshaping of the data - required by Tensorflow and Keras (*necessary step of deep-learning using these repos)
X_train = np.array(X_train).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
# Normalize data - to reduce processing requirements
X_train = X_train/255.0
# model configuration
model = Sequential()
model.add(Conv2D(64, (3,3), input_shape = X_train.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(6))
model.add(Activation("softmax"))
model.compile(loss="mean_squared_error",
optimizer="adam",
metrics=["accuracy"])
# Data Augmentation - Repo enabler
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau
# initialize the training training data augmentation object
trainAug = ImageDataGenerator(
rescale=1 / 255.0,
rotation_range=20,
zoom_range=0.05,
width_shift_range=0.05,
height_shift_range=0.05,
shear_range=0.05,
horizontal_flip=True,
fill_mode="nearest")
# initialize the validation (and testing) data augmentation object
valAug = ImageDataGenerator(rescale=1 / 255.0)
#set a leraning rate annealer
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc',
patience=3,
verbose=1,
factor=0.5,
min_lr=0.00001)
#Augmented Images model development
)
trainAug.fit(X_train)
#Fit the model
epochs = 10
batch_size= 10
history = model.fit_generator(trainAug.flow(X_train, y_train, batch_size=batch_size),epochs = 10, validation_data = (X_test, y_test), steps_per_epoch= X_train.shape[0]// batch_size)
答案 0 :(得分:2)
最初,您有7个标签:您的代码当时期望标签0、1、2、3、4、5、6
您已从数据集中删除标签5,确定。现在您总共有6个标签。
您的代码预期为:0、1、2、3、4、5
但是您数据中包含的是:0、1、2、3、4、6
在删除标签5之后,您需要将标签6转换为5。
以下内容:
if (img == row['image_id']+'.jpg') & (row['dx_code'] > 5):
try:
training_data.append([new_array, row['dx_code'] - 1])
except Exception as ee:
pass
elif (img == row['image_id']+'.jpg') & (row['dx_code'] < 5):
try:
training_data.append([new_array, row['dx_code']])
except Exception as ee:
pass