我正在使用Caffe将GoogleNet网络微调到我自己的数据集。如果我使用IMAGE_DATA图层作为输入学习发生。但是,我需要切换到HDF5层以进行我需要的进一步扩展。当我使用HDF5图层时,不会进行任何学习。
我使用完全相同的输入图像,标签也匹配。我还检查过以确保.h5文件中的数据可以正确加载。确实如此,Caffe也能够找到我提供的示例数量以及正确的类数(2)。
这让我认为问题在于我手动执行的转换(因为HDF5层不执行任何内置转换)。这些代码如下。我做了以下事情:
任何人都可以看到这种方法有什么问题吗?数据扩充是如此重要,没有它就不会进行学习吗?
IMG_RESHAPE = 224
IMG_UNCROPPED = 256
def resize_convert(img_names, path=None, oversample=False):
'''
Load images, set to BGR mode and transpose to CxHxW
and subtract the Imagenet mean. If oversample is True,
perform data augmentation.
Parameters:
---------
img_names (list): list of image names to be processed.
path (string): path to images.
oversample (bool): if True then data augmentation is performed
on each image, and 10 crops of size 224x224 are produced
from each image. If False, then a single 224x224 is produced.
'''
path = path if path is not None else ''
if oversample == False:
all_imgs = np.empty((len(img_names), 3, IMG_RESHAPE, IMG_RESHAPE), dtype='float32')
else:
all_imgs = np.empty((len(img_names), 3, IMG_UNCROPPED, IMG_UNCROPPED), dtype='float32')
#load the imagenet mean
mean_val = np.load('/path/to/imagenet/ilsvrc_2012_mean.npy')
for i, img_name in enumerate(img_names):
img = ndimage.imread(path+img_name, mode='RGB') # Read as HxWxC
#subtract the mean of Imagenet
#First, resize to 256 so we can subtract the mean of dims 256x256
img = img[...,::-1] #Convert RGB TO BGR
img = caffe.io.resize_image(img, (IMG_UNCROPPED, IMG_UNCROPPED), interp_order=1)
img = np.transpose(img, (2, 0, 1)) #HxWxC => CxHxW
#Since mean is given in Caffe channel order: 3xWxH
#Assume it also is given in BGR order
img = img - mean_val
#set to 0-1 range => I don't think googleNet requires this
#I tried both and it didn't make a difference
#img = img/255
#resize images down since GoogleNet accepts 224x224 crops
if oversample == False:
img = np.transpose(img, (1,2,0)) # CxHxW => HxWxC
img = caffe.io.resize_image(img, (IMG_RESHAPE, IMG_RESHAPE), interp_order=1)
img = np.transpose(img, (2,0,1)) #convert to CxHxW for Caffe
all_imgs[i, :, :, :] = img
#oversampling requires HxWxC order
if oversample:
all_imgs = np.transpose(all_imgs, (0, 3, 1, 2))
all_imgs = caffe.io.oversample(all_imgs, (IMG_RESHAPE, IMG_RESHAPE))
all_imgs = np.transpose(all_imgs, (0,2,3,1)) #convert to CxHxW for Caffe
return all_imgs
name: "GoogleNet"
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/path/to/train_list.txt"
batch_size: 32
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/path/to/valid_list.txt"
batch_size:10
}
include: { phase: TEST }
}
当我说没有进行学习时,我的意思是,与IMG_Data相比,使用HDF5数据时,我的训练损失不会持续下降。在下图中,第一幅图描绘了IMG_DATA网络的训练损失的变化,另一幅是HDF5数据网络。
我正在考虑的一种可能性是,网络过度拟合了我正在喂它的每个.h5。目前我正在使用数据扩充,但所有扩充示例都与其他示例一起存储在单个.h5文件中。但是,因为单个输入图像的所有增强版本都包含在同一个.h5文件中,我认为这可能会导致网络过度匹配到特定的.h5文件。但是,我不确定这是否是第二个情节所暗示的。
答案 0 :(得分:0)
我遇到了同样的问题,发现由于某些原因,你在代码中手动进行转换会导致图像全黑(全为零)。尝试调试您的代码,看看是否发生了这种情况。 解决方案是使用Caffe教程中解释的相同方法 http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb 你看到的部分
# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # move image channels to outermost dimension
transformer.set_mean('data', mu) # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0)) # swap channels from RGB to BGR
然后几行
image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)