Keras - 将3通道图像输入LSTM

时间:2017-12-06 10:12:40

标签: python keras lstm recurrent-neural-network

我已将一系列图像读入形状为(7338, 225, 1024, 3)的numpy数组,其中7338为样本大小,225为时间步长,1024 (32x32)为展平图像像素,3个通道(RGB)。

我有一个带LSTM层的顺序模型:

model = Sequential()
model.add(LSTM(128, input_shape=(225, 1024, 3))

但这会导致错误:

Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4

documentation提到LSTM图层的输入张量应为3D tensor with shape (batch_size, timesteps, input_dim),但在我的情况下,input_dim为2D。

在Keras中将3通道图像输入LSTM图层的建议方法是什么?

2 个答案:

答案 0 :(得分:6)

如果您希望图像的数量为序列(如带有帧的影片),则需要将像素和通道作为特征:

// db.js

const mysql = require('mysql');

const connection = mysql.createConnection({
    host     : 'localhost',
    user     : ****,
    password : ****,
    database : ****,
});

module.exports = connection;

module.exports.queryPromise = function (args) {
  return new Promise((resolve, reject) => {
    connection.query(args, (err, rows, fields) => {
      if (err) return reject(err);
      resolve(rows);
    });
  });
};

module.exports.connectPromise = new Promise((resolve, reject) => {
  connection.connect(err => {
    if (err) reject(err);
    resolve();
  });
});

// app.js
const db = require('../config/db');

const periods = ['1h','12h','24h','1w','1m','3m','1y','all'];
const sqlCarIds = `SELECT id FROM car_models ORDER BY id DESC LIMIT 200;`;

return db.queryPromise(sqlCarIds)
.then((rows) => {
  const car_ids = [];
  for (let i = rows.length - 1; i >= 0; i--) {
    car_ids.push(rows[i].car_id);
  };

  for (let i = periods.length - 1; i >= 0; i--) {

    const sqlSnapshot = `SELECT price FROM car_models;`;

    db.queryPromise(sqlSnapshot)
    .then(([row]) => {
      if (!row) {
        throw new Error('API call found nothin');
      }

      const highPrice = row.high;
      const sqlInsert = `INSERT into price_cache (high) VALUES (` + highPrice` + )`;`

      console.log(sqlInsert); // logs correctly formed query
      db.queryPromise(sqlInsert)
      .then(() => {
        console.log('this should fire'); // doesn't fire
      });
    });
  }
});

如果在将3072个特征投射到LSTM之前需要更多处理,则可以将2D卷积和LSTM组合或交错用于更精细的模型(不一定更好,但每个应用程序都有其特定的行为)。

您也可以尝试使用新的http://www.pydev.org/developers.html,它将采用五维输入:

input_shape = (225,3072)  #a 3D input where the batch size 7338 wasn't informed

在添加input_shape=(225,32,32,3) #a 5D input where the batch size 7338 wasn't informed 和最后TimeDistributed(Conv2D(...))之前,我可能会创建一个包含多个TimeDistributed(MaxPooling2D(...))TimeDistributed(Flatten())的卷积网。这很可能会提高您对图像的理解和LSTM的性能。

答案 1 :(得分:1)

现在有一个指南,如何在 keras 指南中创建具有嵌套结构的 RNN,它可以为每个时间步启用任意输入类型:https://www.tensorflow.org/guide/keras/rnn#rnns_with_listdict_inputs_or_nested_inputs