模型
按照该项目的步骤,我一直在尝试从头开始训练VGG_FACE_16_layers
网络:
https://github.com/danduncan/HappyNet
然后,我从mdb
文件中在云GPU上训练了面部/情绪识别模型,并生成了文件my_face.caffemodel.
它有6个标签,尽管预测精度不是最佳的,但模型似乎可以使用。
net = caffe.Net('models/vitor_face/deploy.prototxt',
'models/vitor_face/my_face.caffemodel',
caffe.TEST)
W = net.params['fc7'][0].data[...]
b = net.params['fc7'][1].data[...]
所有图层的 W
和b
似乎打印有效值。
和体系结构:
[('data', (1, 3, 224, 224)), ('conv1', (1, 96, 111, 111)), ('norm1', (1, 96, 111, 111)), ('pool1', (1, 96, 37, 37)), ('conv2', (1, 256, 37, 37)), ('pool2', (1, 256, 19, 19)), ('conv3', (1, 512, 19, 19)), ('conv4', (1, 512, 19, 19)), ('conv5', (1, 512, 19, 19)), ('pool5', (1, 512, 7, 7)), ('fc6', (1, 4048)), ('fc7', (1, 4048)), ('fc8', (1, 6)), ('prob', (1, 6))]
[('conv1', (96, 3, 7, 7), (96,)), ('conv2', (256, 96, 5, 5), (256,)), ('conv3', (512, 256, 3, 3), (512,)), ('conv4', (512, 512, 3, 3), (512,)), ('conv5', (512, 512, 3, 3), (512,)), ('fc6', (4048, 25088), (4048,)), ('fc7', (4048, 4048), (4048,)), ('fc8_cat', (6, 4048), (6,))]
目标
我的目标是使用network
中的图层作为objective ends
,以生成dreams
,并使用此代码使用于训练的面部“出现”在输入图像上:
model_path = 'happyNet/models/vitor_face/' # substitute your path here
net_fn = model_path + 'deploy.prototxt'
param_fn = model_path + 'my_face.caffemodel'
MEAN_FILE = model_path + 'mean_training_image.binaryproto'
proto_data = open(MEAN_FILE, "rb").read()
a = caffe.io.caffe_pb2.BlobProto.FromString(proto_data)
MEAN = caffe.io.blobproto_to_array(a)[0]
net = caffe.Classifier(net_fn,
param_fn,
mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
channel_swap = (2,1,0)) # ImageNet mean, training set dependent) # the reference model has channels in BGR order instead of RGB
# a couple of utility functions for converting to and from Caffe's input image layout
def preprocess(net, img):
return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
def deprocess(net, img):
return np.dstack((img + net.transformer.mean['data'])[::-1])
def objective_L2(dst):
dst.diff[:] = dst.data
def make_step(net, step_size=1.5, end='fc7',
jitter=32, clip=True, objective=objective_L2):
'''Basic gradient ascent step.'''
src = net.blobs['data'] # input image is stored in Net's 'data' blob
dst = net.blobs[end]
#print ('src.data', src.data)
# print ('PERCENTILE',np.percentile(net.blobs[end].data[0], (0, 10, 50, 90, 100)))
ox, oy = np.random.randint(-jitter, jitter+1, 2)
src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift
net.forward(end=end)
objective(dst) # specify the optimization objective
net.backward(start=end)
g = src.diff[0]
# apply normalized ascent step to the input image
src.data[:] += step_size/np.abs(g).mean() * g
src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image
if clip:
bias = net.transformer.mean['data']
src.data[:] = np.clip(src.data, -bias, 255-bias)
def deepdream(net, base_img, iter_n=20, octave_n=4, octave_scale=1.4,
end='fc7', clip=True, **step_params):
# prepare base images for all octaves
octaves = [preprocess(net, base_img)]
for i in xrange(octave_n-1):
octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))
src = net.blobs['data']
#print src.data
# print blobs infos
print [(k, v.data.shape) for k, v in net.blobs.items()]
#print weight and bias parameters
print [(k, v[0].data.shape, v[1].data.shape) for k, v in net.params.items()]
detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details
for octave, octave_base in enumerate(octaves[::-1]):
h, w = octave_base.shape[-2:]
if octave > 0:
# upscale details from the previous octave
h1, w1 = detail.shape[-2:]
detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)
src.reshape(1,3,h,w) # resize the network's input image size
src.data[0] = octave_base+detail
for i in xrange(20):
make_step(net, end=end, clip=clip, **step_params)
# visualization
vis = deprocess(net, src.data[0])
if not clip: # adjust image contrast if clipping is disabled
vis = vis*(255.0/np.percentile(vis, 99.98))
showarray(vis)
# save images to disk
PIL.Image.fromarray(np.uint8(vis)).save('results/{}_{}_{}.png'.format(octave, i, vis.shape))
print octave, i, end, vis.shape
clear_output(wait=True)
# extract details produced on the current octave
detail = src.data[0]-octave_base
# returning the resulting image
return deprocess(net, src.data[0])
然后运行它,期望使受过训练的面部出现在云图像上,如下所示:
img = np.float32(PIL.Image.open('img/clouds.jpg'))
_=deepdream(net, img, end)
相关文件
deploy.prototxt
name: "VGG_FACE_16_layers"
input: "data"
input_dim: 1
input_dim: 3
input_dim: 224
input_dim: 224
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 7
stride: 2
weight_filler {
type: "gaussian"
std: 0.01 # distribution with stdev 0.01 (default mean: 0)
}
bias_filler {
type: "constant" # initialize the biases to zero (0)
value: 0
}
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "norm1"
type: LRN
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0005
beta: 0.75
}
}
layers {
name: "pool1"
type: POOLING
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 3
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu2"
type: RELU
bottom: "conv2"
top: "conv2"
}
layers {
name: "pool2"
type: POOLING
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "pool2"
top: "conv3"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu3"
type: RELU
bottom: "conv3"
top: "conv3"
}
layers {
name: "conv4"
type: CONVOLUTION
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu4"
type: RELU
bottom: "conv4"
top: "conv4"
}
layers {
name: "conv5"
type: CONVOLUTION
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu5"
type: RELU
bottom: "conv5"
top: "conv5"
}
layers {
name: "pool5"
type: POOLING
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 3
}
}
layers {
name: "fc6"
type: INNER_PRODUCT
bottom: "pool5"
top: "fc6"
inner_product_param {
num_output: 4048
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu6"
type: RELU
bottom: "fc6"
top: "fc6"
}
layers {
name: "drop6"
type: DROPOUT
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc7"
type: INNER_PRODUCT
bottom: "fc6"
top: "fc7"
inner_product_param {
num_output: 4048
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu7"
type: RELU
bottom: "fc7"
top: "fc7"
}
layers {
name: "drop7"
type: DROPOUT
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc8_cat"
type: INNER_PRODUCT
bottom: "fc7"
top: "fc8"
inner_product_param {
num_output: 6
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "prob"
type: SOFTMAX
bottom: "fc8"
top: "prob"
}
train.prototxt
name: "CaffeNet"
layers {
name: "training_train"
type: DATA
data_param {
source: "/input/training_set_lmdb"
backend: LMDB
batch_size: 32
}
transform_param{
mean_file: "/input/mean_training_image.binaryproto"
}
top: "data"
top: "label"
include {
phase: TRAIN
}
}
layers {
name: "training_test"
type: DATA
data_param {
source: "/input/validation_set_lmdb"
backend: LMDB
batch_size: 15
}
transform_param{
mean_file: "/input/mean_training_image.binaryproto"
}
top: "data"
top: "label"
include {
phase: TEST
}
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 7
stride: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "norm1"
type: LRN
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0005
beta: 0.75
}
}
layers {
name: "pool1"
type: POOLING
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 3
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu2"
type: RELU
bottom: "conv2"
top: "conv2"
}
layers {
name: "pool2"
type: POOLING
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "pool2"
top: "conv3"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu3"
type: RELU
bottom: "conv3"
top: "conv3"
}
layers {
name: "conv4"
type: CONVOLUTION
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu4"
type: RELU
bottom: "conv4"
top: "conv4"
}
layers {
name: "conv5"
type: CONVOLUTION
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu5"
type: RELU
bottom: "conv5"
top: "conv5"
}
layers {
name: "pool5"
type: POOLING
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 3
}
}
layers {
name: "fc6"
type: INNER_PRODUCT
bottom: "pool5"
top: "fc6"
inner_product_param {
num_output: 4048
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers {
name: "relu6"
type: RELU
bottom: "fc6"
top: "fc6"
}
layers {
name: "drop6"
type: DROPOUT
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc7"
type: INNER_PRODUCT
bottom: "fc6"
top: "fc7"
inner_product_param {
num_output: 4048
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu7"
type: RELU
bottom: "fc7"
top: "fc7"
}
layers {
name: "drop7"
type: DROPOUT
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc8_cat"
type: INNER_PRODUCT
bottom: "fc7"
top: "fc8_cat"
inner_product_param {
num_output: 6
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "prob"
type: SOFTMAX_LOSS
bottom: "fc8_cat"
bottom: "label"
}
回溯
上面的代码适用于预训练的网络,但不适用于从头开始训练的我的网络。
以某种方式看起来网络并没有被完全转发,因为 我得到一些零分的这种回溯,就像这样:
dream.py:178: RuntimeWarning: divide by zero encountered in divide
src.data[:] += step_size/np.abs(g).mean() * g
dream.py:178: RuntimeWarning: invalid value encountered in multiply
src.data[:] += step_size/np.abs(g).mean() * g
(结果图像完全是黑色的,如果我手动加偏斜线-g+.1
,我只会打印原始图像,而根本不会转换。)
此外,我收到以下InnerProduct
警告,这表明我
input_dim
在分类层可能是错误的,但我没有
了解为什么也不可能如此,因为原始项目使用
input_dim 224,224
:
>275 inner_product_layer.cpp:64] Check failed: K_ == new_K (25088 vs. 20480) Input size incompatible with inner product parameters.
无论如何,我试图将input_dim
处的deploy.prototxt
更改为AlexNet的值227
,但无济于事。
OBS 由于我想使用我的模型来产生梦想而不是对图像进行分类,所以我想知道是否应该像这样回答那样将模型
FullyConnected
一直制作到分类层的上层:Caffe: variable input-image size
如果这是使模型与GoogLeNet架构兼容并激活我的模型中的
=end
层的方法,请有人告诉我该怎么做?
请,我们将不胜感激
我很乐意根据要求通过聊天分享我的模型。