我正在做一个Keras CIFAR10学习实验,图像是从Kaggle那里得到的,该图像是CSV文件,两列位于'id',另一个为'label'。从这里开始我知道我需要将标签转换为张量,但不知道该怎么做。我到处都在互联网上寻找方法,但是找不到与从kaggle读取CSV文件有关的任何内容。也许这不是做到这一点的方法。...
这是链接https://www.kaggle.com/c/cifar-10,但是没有内核作为示例。
预先感谢您的帮助。
我正在使用来自tensorflow.keras.xxxxxx
import pandas as pd
print("Image IDs and Labels (TRAIN)")
train_df = pd.read_csv(TRAIN_DF_PATH)
# Add extension to id_code to train images
train_df['id'] = train_df['id'].apply(str) + ".png"
display(train_df.head())
def preprocess_image(path, sigmaX=40):
"""
The whole preprocessing pipeline:
1. Read in image
3. Resize image to desired size
"""
image = cv2.imread(path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (IMG_WIDTH, IMG_HEIGHT))
return image
# Add Image augmentation to our generator
train_datagen = ImageDataGenerator(rotation_range=360,
horizontal_flip=True,
vertical_flip=True,
validation_split=0.25,
rescale=1. / 255)
# Use the dataframe to define train and validation generators
train_generator = train_datagen.flow_from_dataframe(train_df,
x_col='id',
y_col='label',
directory = TRAIN_IMG_PATH,
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=BATCH_SIZE,
class_mode='other',
preprocessing_function=preprocess_image,
subset='training')
val_generator = train_datagen.flow_from_dataframe(train_df,
x_col='id',
y_col='label',
directory = TRAIN_IMG_PATH,
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=BATCH_SIZE,
class_mode='other',
preprocessing_function=preprocess_image,
subset='validation')
Batch_Size = 64
epochs = 25
# loop over the number of models to train
for i in np.arange(0, 5):
# initialize the optimizer and model
print("[INFO] training model {}/{}".format(i + 1, 5))
opt = Adam(lr=1e-5)
conv_base = ResNet50(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
model = models.Sequential()
model.add(conv_base)
model.add(layers.UpSampling2D((2,2)))
model.add(layers.UpSampling2D((2,2)))
model.add(layers.UpSampling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.BatchNormalization())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.BatchNormalization())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.BatchNormalization())
model.add(layers.Dense(10, activation='softmax'))
early_stop = EarlyStopping('val_loss', patience=5)
reduce_lr = ReduceLROnPlateau('val_loss', factor=0.01, patience=3, verbose=1)
############################################################################
trained_models_path = './best_model_adam/'
model_names = trained_models_path + 'epoch_{epoch:02d}_val_acc_{val_acc:.4f}_'
model_checkpoint = ModelCheckpoint(model_names +"model_{}.hdf5".format(i), verbose=1, save_best_only=True)
############################################################################
callbacks = [model_checkpoint, early_stop, reduce_lr]
#model.compile(optimizer=optimizers.RMSprop(lr=2e-5), loss='binary_crossentropy', metrics=['acc'])
model.compile(optimizer=Adam(lr=1e-5), loss='binary_crossentropy', metrics=['acc'])
# train the network
history = model.fit_generator(
train_generator,
epochs = epochs,
steps_per_epoch= train_df.shape[0] // Batch_Size,
validation_data= val_generator,
validation_steps = val_generator.shape[0] // Batch_Size,
#batch_size = Batch_Size,
verbose=1,
callbacks = [model_checkpoint, early_stop]
)
# save the model to disk
p = ["./models/model_{}.model".format(i)]
model.save(os.path.sep.join(p))
# evaluate the network
predictions = model.predict(testX, batch_size=64)
report = classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=labelNames)
# save the classification report to file
p = ["./output/model_{}.txt".format(i)]
f = open(os.path.sep.join(p), "w")
f.write(report)
f.close()
当我运行fit_generator时,我得到了他的错误
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
244 """
245 return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 246 allow_broadcast=True)
247
248
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
252 ctx = context.context()
253 if ctx.executing_eagerly():
--> 254 t = convert_to_eager_tensor(value, ctx, dtype)
255 if shape is None:
256 return t
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
113 return t
114 else:
--> 115 return ops.EagerTensor(value, handle, device, dtype)
116
117
ValueError: could not convert string to float: 'horse'
答案 0 :(得分:0)
您可以将类别标签转换为数字,然后为这些数字新建一列。 scikit-learn对此具有内置功能,但是没有它就足够简单了:
import pandas as pd
df = pd.DataFrame({'label':['cat','dog','horse'],'b':[1,2,3]})
all_labels= df.label.unique().tolist()
all_labels.sort()
label_to_number={label:all_labels.index(label) for label in all_labels}
df['label_num']=df.apply(lambda r:label_to_number[r.label],axis=1)
现在,您可以将label_number发送给您的训练(y_col ='label_number')。所有这些都假定整数类别是可以的,并且您不需要“一次性编码”-如果您这样做,则scikit再次为此提供了条件。从here来看,整数类别似乎很好。
答案 1 :(得分:0)
@jeremy_rutman, 谢谢!我知道了
import pandas as pd
print("Image IDs and Labels (TRAIN)")
train_df = pd.read_csv(TRAIN_DF_PATH)
# Add extension to id_code to train images
train_df['id'] = train_df['id'].apply(str) + ".png"
all_labels = train_df['label'].unique().tolist()
all_labels.sort()
label_to_number={label:all_labels.index(label) for label in all_labels}
train_df['label']=train_df.apply(lambda r:label_to_number[r.label],axis=1)
display(train_df.head())
print(train_df['id'])
```
The model if fitting now, but for some reason, my two GPU's cards
are not kicking in.... I think lots of things got broken with TensorFlow
2.0, but that is another topic...
thanks a lot for your help.