我有一组300.000+张图像,包含38个类别。当我训练图像时,我的val_loss和val_acc都低,但是当我尝试预测其中一张图像时(甚至从训练集中),它都不会给出正确的答案,甚至无法给出答案。
val_loss约为0.1023,而val_acc约为0.9738。
我尝试设置不同的图像,这些图像是由jpgraph生成的,具有不同种类的上下数据,用于来自我的水族馆计算机的特定测量日志数据,这与水的稳定性有关。我有5百万条mysql数据规则,这些规则是在5分钟的时间轴图像中生成的,每个规则有4个值。我已经上传了其中4张图片,您可以在https://ponne.nu/images/
上查看它们我想做的是预测班级的下一步,因此,提前5分钟,经过培训,我会展示图片,然后根据数据上一堂课(例如200上,500下来)。
所以当我训练时,以任何形式的组合,据我所知,它将在acc,los,val_acc和val_loss上给出良好的结果。
实际训练可以节省每种模型(因此我可以在每次训练后进行测试)
Epoch 00023: saving model to /opt/graphs/saved/saved-model-23-0.97.hdf5
Epoch 24/50
37101/37101 [==============================] - 2013s 54ms/step - loss: 0.0968 - acc: 0.9738 - val_loss: 0.1048 - val_acc: 0.9731
Epoch 00024: saving model to /opt/graphs/saved/saved-model-24-0.97.hdf5
Epoch 25/50
37101/37101 [==============================] - 2014s 54ms/step - loss: 0.0968 - acc: 0.9738 - val_loss: 0.1012 - val_acc: 0.9734
Epoch 00025: saving model to /opt/graphs/saved/saved-model-25-0.97.hdf5
Epoch 26/50
37101/37101 [==============================] - 2016s 54ms/step - loss: 0.0968 - acc: 0.9738 - val_loss: 0.1092 - val_acc: 0.9725
训练脚本的一部分:
FAST_RUN = False
IMAGE_WIDTH=220
IMAGE_HEIGHT=220
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3 # RGB color
filenames = os.listdir("/opt/images/")
categories = []
for filename in filenames:
category = filename.split('.')[0]
if category == '0same':
categories.append(0)
elif category == '100up':
categories.append(1)
elif category == '200up':
categories.append(2)
elif category == '300up':
categories.append(3)
(snip)
df = pd.DataFrame({
'filename': filenames,
'category': categories
})
df['category'] = df['category'].astype('str');
以及所有其他类别(最多38个类别)将导致38的密集层。
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(38, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
图像生成器
train_df, validate_df = train_test_split(df, test_size=0.27, random_state=42)
train_df = train_df.reset_index(drop=True)
validate_df = validate_df.reset_index(drop=True)
total_train = train_df.shape[0]
total_validate = validate_df.shape[0]
batch_size=10
train_datagen = ImageDataGenerator(
rescale=1./255
)
train_generator = train_datagen.flow_from_dataframe(
train_df,
"/opt/images/",
x_col='filename',
y_col='category',
target_size=IMAGE_SIZE,
class_mode='categorical',
batch_size=batch_size
)
validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_dataframe(
validate_df,
"/opt/images/",
x_col='filename',
y_col='category',
target_size=IMAGE_SIZE,
class_mode='categorical',
batch_size=batch_size
)
epochs=50
history = model.fit_generator(
train_generator,
epochs=epochs,
validation_data=validation_generator,
validation_steps=total_validate//batch_size,
steps_per_epoch=total_train//batch_size,
callbacks=callbacks
)
我在代码中的某处将验证数据/训练数据分成了单独的块,因此它将在未知图像上进行验证。
然后开始预测的那部分无效
batch_size=1
IMAGE_WIDTH=220
IMAGE_HEIGHT=220
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3
批处理大小为1,然后再次插入模型而没有丢失。之后,我加载重量。
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.load_weights('saved/saved-model-33-0.97.hdf5')
实际预测,给出图像并预测给定的类
def load_image(img_path, show=False):
img = image.load_img(img_path, target_size=(220, 220))
img_tensor = image.img_to_array(img) # (height, width, channels)
img_tensor = np.expand_dims(img_tensor, axis=0) # (1, height, width, channels), add a dimension because the model expects this shape: (batch_size, height, width, channels)
img_tensor /= 255. # imshow expects values in the range [0, 1]
if show:
plt.imshow(img_tensor[0])
plt.axis('off')
plt.show()
return img_tensor
img_path = '/opt/testimages/test900up.jpg'
new_image = load_image(img_path)
pred = model.predict_classes(new_image, batch_size=1, verbose=1)
print pred
就是这样,在保存了50个模型之后,它有时会接近每个给定的已保存模型,但是有时只有随机图像,可以称其为自然运气,而不是实时预测。
由于数据更像是一种流动类型,因此我首先使用LSTM和硬数据进行了尝试,其结果与图像相同,具有很高的准确性和较低的损失,但是预测是绝对错误的。验证数据如何根据统计数据很好,而在同一张图像上进行预测却是如此糟糕?我在这里做错了什么?请注意,我是新手程序员。