我正在使用带有tensorflow-gpu 2.1的jupyter-notebook。
当我尝试将数据输入模型时,它给我一个错误。一些图像属于两个或多个不同的类别。由于这是一个多标签任务,因此我需要自己定义类。
我正在使用的路径:
test_size=0.2
img_rows, img_cols, channels = 224,224,3
batch_size = 1
#Specify paths
df_path = '/home/Erdal.Genc/covid_work/dset_preprocessing/NIH/NIH_df_ed.csv'
img_path = '/mnt/dsets/ChestXrays/NIH/images'
outputfolder = '/home/Erdal.Genc/covid_work/image_analysis'
拆分火车并进行测试:
#import dataframe
NIH_df = pd.read_csv(df_path, low_memory=False, dtype=str)
#Split into test and training data
train_df, test_df = train_test_split(NIH_df, test_size=test_size)
print("Train and test data>>", len(train_df),len(test_df))
class_list=["No Finding", "Edema", "Atelectasis", "Consolidation", "Infiltration", "Effusion",
"Hernia", "Pneumothorax", "Pneumonia", "Mass", "Nodule", "Emphysema",
"Pleural_Thickening", "Cardiomegaly", "Fibrosis"]
训练和测试数据>> 89696 22424
ImageDatagenerator称为:
#Create training array
#On the fly with keras flow_from_dataframe
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator=train_datagen.flow_from_dataframe(
dataframe=train_df,
directory='/mnt/dsets/ChestXrays/NIH/images',
x_col="Image Index",
y_col='labels',
has_ext=True,
batch_size=batch_size,
seed=42,
class_mode='categorical',
classes=class_list,
#color_mode = 'grayscale',
target_size=(img_rows, img_cols))
valid_generator=test_datagen.flow_from_dataframe(
dataframe=test_df,
directory=img_path,
x_col="Image Index",
y_col='labels',
#subset="validation",
batch_size=1,#batch_size,
seed=42,
class_mode='categorical',
classes=class_list,
#color_mode = 'grayscale',
target_size=(img_rows, img_cols))
STEP_SIZE_TRAIN=train_generator.n//train_generator.batch_size
STEP_SIZE_VALID=valid_generator.n//valid_generator.batch_size
结果:
*>找到0个经过验证的图像文件名,它们属于15个类。
找到了0个经过验证的图像文件名,它们属于15个类。*
当我这样评论“ classes = class_list”时:
train_generator=train_datagen.flow_from_dataframe(
dataframe=train_df,
directory='/mnt/dsets/ChestXrays/NIH/images',
x_col="Image Index",
y_col='labels',
has_ext=True,
batch_size=batch_size,
seed=42,
class_mode='categorical',
#classes=class_list,
#color_mode = 'grayscale',
target_size=(img_rows, img_cols))
valid_generator=test_datagen.flow_from_dataframe(
dataframe=test_df,
directory=img_path,
x_col="Image Index",
y_col='labels',
#subset="validation",
batch_size=1,#batch_size,
seed=42,
class_mode='categorical',
#classes=class_list,
#color_mode = 'grayscale',
target_size=(img_rows, img_cols))
它找到图像但找不到正确的类:
*>找到了属于769类的89695个经过验证的图像文件名。找到
22424验证的图像文件名属于426个类别。*
有什么办法吗?
谢谢!