Question

我已将一系列图像读入形状为(7338, 225, 1024, 3)的numpy数组，其中7338为样本大小，225为时间步长，1024 (32x32)为展平图像像素，3个通道（RGB）。

我有一个带LSTM层的顺序模型：

model = Sequential()
model.add(LSTM(128, input_shape=(225, 1024, 3))

但这会导致错误：

Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4

documentation提到LSTM图层的输入张量应为3D tensor with shape (batch_size, timesteps, input_dim)，但在我的情况下，input_dim为2D。

在Keras中将3通道图像输入LSTM图层的建议方法是什么？

Answer 1

如果您希望图像的数量为序列（如带有帧的影片），则需要将像素和通道作为特征：

// db.js

const mysql = require('mysql');

const connection = mysql.createConnection({
    host     : 'localhost',
    user     : ****,
    password : ****,
    database : ****,
});

module.exports = connection;

module.exports.queryPromise = function (args) {
  return new Promise((resolve, reject) => {
    connection.query(args, (err, rows, fields) => {
      if (err) return reject(err);
      resolve(rows);
    });
  });
};

module.exports.connectPromise = new Promise((resolve, reject) => {
  connection.connect(err => {
    if (err) reject(err);
    resolve();
  });
});

// app.js
const db = require('../config/db');

const periods = ['1h','12h','24h','1w','1m','3m','1y','all'];
const sqlCarIds = `SELECT id FROM car_models ORDER BY id DESC LIMIT 200;`;

return db.queryPromise(sqlCarIds)
.then((rows) => {
  const car_ids = [];
  for (let i = rows.length - 1; i >= 0; i--) {
    car_ids.push(rows[i].car_id);
  };

  for (let i = periods.length - 1; i >= 0; i--) {

    const sqlSnapshot = `SELECT price FROM car_models;`;

    db.queryPromise(sqlSnapshot)
    .then(([row]) => {
      if (!row) {
        throw new Error('API call found nothin');
      }

      const highPrice = row.high;
      const sqlInsert = `INSERT into price_cache (high) VALUES (` + highPrice` + )`;`

      console.log(sqlInsert); // logs correctly formed query
      db.queryPromise(sqlInsert)
      .then(() => {
        console.log('this should fire'); // doesn't fire
      });
    });
  }
});

如果在将3072个特征投射到LSTM之前需要更多处理，则可以将2D卷积和LSTM组合或交错用于更精细的模型（不一定更好，但每个应用程序都有其特定的行为）。

您也可以尝试使用新的http://www.pydev.org/developers.html，它将采用五维输入：

input_shape = (225,3072)  #a 3D input where the batch size 7338 wasn't informed

在添加input_shape=(225,32,32,3) #a 5D input where the batch size 7338 wasn't informed和最后TimeDistributed(Conv2D(...))之前，我可能会创建一个包含多个TimeDistributed(MaxPooling2D(...))和TimeDistributed(Flatten())的卷积网。这很可能会提高您对图像的理解和LSTM的性能。

Answer 2

现在有一个指南，如何在 keras 指南中创建具有嵌套结构的 RNN，它可以为每个时间步启用任意输入类型：https://www.tensorflow.org/guide/keras/rnn#rnns_with_listdict_inputs_or_nested_inputs

Keras - 将3通道图像输入LSTM

2 个答案: