带图像的gcloud ml-engine bucket

时间:2018-04-08 14:10:03

标签: google-cloud-storage google-cloud-ml

我正在尝试在gcloud上训练模型。我将数据上传到一个名为Pokemon的文件夹到gs存储桶中。 这些数据不需要任何标签,因为我正在进行无监督学习。虽然在本地运行代码确实有效,但当我尝试在gcloud上训练时,我在正确获取数据时遇到了问题。

这是我的任务代码:

import tensorflow as tf 
import argparse
import numpy as np 
import trainer.model as model
from tensorflow.contrib.training.python.training import hparam


def run_experiment(hparams):
train_input = model.input_fn(hparams.train_dir)

# Transpose RGB channels into 3 different independent image
# Then flatted all pixel into one dimension
X_flat = np.transpose(train_input, (0,3,1,2))
X_flat = X_flat.reshape(2376, 1600)

print ('Original image shape:  {0}\nFlatted image shape:  {1}'.format(train_input.shape, X_flat.shape))

print ('Constructing model')

# tf Graph input (only pictures)
X = tf.placeholder("float", [None, model.n_input])
# Construct model
encoder_op = model.encoder(X)
variation_op, KLD, epsilon, layer_mu = model.variation(encoder_op)
decoder_op = model.decoder(variation_op)
# Prediction
y_pred = decoder_op
# Targets (labels) are the input data
y_true = X

# Define loss and optimizer
l2_loss = tf.add_n([tf.nn.l2_loss(model.weights[w]) for w in model.weights])
BCE = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_pred, labels=y_true), reduction_indices=1)

cost = tf.reduce_mean(BCE+KLD)+model.l2_lambda*l2_loss
optimizer = tf.train.RMSPropOptimizer(model.learning_rate).minimize(cost)

# Init variables
init = tf.global_variables_initializer()
# Create session and graph, init variables
sess = tf.InteractiveSession()
sess.run(init)
total_batch = int(X_flat.shape[0]/model.batch_size)
# Training cycle
for epoch in range(model.training_epochs):
    # Loop over all batches
    start = 0; end = model.batch_size
    for i in range(total_batch-1):
        index = np.arange(start, end)
        np.random.shuffle(index)
        batch_xs = X_flat[index]
        start = end; end = start+model.batch_size
        #Run optimization op (backprop) and loss op (to get loss value)
        _, c = sess.run([optimizer, cost], feed_dict={X: batch_xs})
    # Display logs per epoch step
    if ((epoch == 0) or (epoch+1) % model.display_step == 0) or ((epoch+1) == model.training_epochs):
        print ('Epoch: {0:04d}   loss: {1:f}'.format(epoch+1, c))
print("Optimization finished")

# Save trained Variables 
weightSaver = tf.train.Saver(var_list=model.weights)
biaseSaver = tf.train.Saver(var_list=model.biases)
save_path = weightSaver.save(sess, hparams.job_dir+"/VAE_weights.ckpt")
save_path = biaseSaver.save(sess, hparams.job_dir+"/VAE_biases.ckpt")


if __name__ == '__main__':
parser = argparse.ArgumentParser()
# Input Arguments
parser.add_argument(
    '--train-dir',
    help='GCS or local paths to training data',
    nargs='+',
    required=True       
)

parser.add_argument(
    '--job-dir',
    help='GCS location to write checkpoints and export models',
    required=True
)
args = parser.parse_args()

hparams=hparam.HParams(**args.__dict__)

run_experiment(hparams)

这是inputFn

def input_fn(dir):
images = np.empty((0, 40, 40, 3), dtype='float32')
for pic in glob.glob(dir[0]+'/*.png'):
    img = mpimg.imread(pic)
    # remove alpha channel  %some alpha=0 but RGB is not equal to [1., 1., 1.]
    img[img[:,:,3]==0] = np.ones((1,4))
    img = img[:,:,:3]
    images = np.append(images, [img], axis=0)

return images

我的问题是,当我使用以下方式开始培训时:

gcloud ml-engine jobs submit training $JOB_NAME \
    --job-dir $OUTPUT_PATH \
    --runtime-version 1.4 \
    --module-name trainer.train_task \
    --package-path trainer/ \
    --region $REGION \
    -- \
    --train-dir $TRAIN_DATA 

使用TRAIN_DATA = gs:// $ BUCKET_NAME / Pokemon

我收到此错误:ValueError:无法将大小为0的数组重塑为形状(2376,1600) 这意味着它不会获取任何图像。 如果我使用本地存储的Pokemon文件夹的绝对路径在本地运行它,那么相同的代码就可以工作。

有谁知道我做错了什么?

一切顺利。

1 个答案:

答案 0 :(得分:1)

此问题类似于this one,但它并未直接涵盖matplotlib的imread函数。

简而言之,常规Python文件操作(如glob.glob)和内部使用常规Python文件操作的任何函数(在本例中为Matplotlib' imread函数都使用Python open {{ 1}} function)无法处理GCS。更多信息可以在this answer中找到。

将信息用于您的案例,并利用imread允许您传入类似文件的对象这一事实,您需要以下内容:

import tensorflow as tf
from tensorflow.python.lib.io import file_io

def input_fn(dir):
  images = np.empty((0, 40, 40, 3), dtype='float32')
  for pic in file_io.get_matching_files(dir[0]+'/*.png'):
    with file_io.FileIO(pic, 'rb') as f:
      img = mpimg.imread(f)
    # remove alpha channel  %some alpha=0 but RGB is not equal to [1., 1., 1.]
    img[img[:,:,3]==0] = np.ones((1,4))
    img = img[:,:,:3]
    images = np.append(images, [img], axis=0)

  return images