如何在矩阵中使用矩阵形输入用于密集层?

时间:2020-05-29 08:18:18

标签: python r tensorflow keras

要建立回归/预测模型,我想采用传感器读数的矩阵(行〜传感器,列〜时间点)并预测这些传感器的未来趋势。

示例实现

# install.packages(c("keras", "tensorflow"))
library(keras)
library(tensorflow)

#' Prepare some training data mapping matrices to other smaller matrices where the response entries correspond to basic math
n = 1000000
nb = 10
mx = matrix(rnorm(6 * n, 0, 1), nrow = n, byrow = TRUE)
my = matrix(0, nrow = n, ncol = 3)
eps = 0.01

for (i in 1 : n) {
    x1 = mx[i, 1]; x2 = mx[i, 2]; x3 = mx[i, 3]; x4 = mx[i, 4]; x5 = mx[i, 5]; x6 = mx[i, 6];
    s1 = x1 * x1;   s2 = x2 * x2;   s3 = x3 * x3;   s4 = x4 * x4;   s5 = x5 * x5;   s6 = x6 * x6;
    zz = rnorm(1, 0, 1)

    my[i, 1] = (x1 + x2 + x3 + x4 + x5 + x6 + eps * zz)
    my[i, 2] = (s1 + s2 + eps * zz * zz)
    my[i, 3] = (x1 * s1 + s2 + x5 * s5 + x6 * s6 + eps * zz)
}

#' Recast into tf types
x_train = tf$constant(mx, shape = as.integer(c(n / nb, nb, 6)))
# FLATTENING the input would work WOULD WORK:
# x_train = tf$constant(mx, shape = as.integer(c(n / nb, nb, 6)))
y_train = tf$constant(my, shape = as.integer(c(n / nb, nb, 3)))


#' Build the model
inputShape = dim(x_train)[- 1] 
outputShape = dim(y_train)[- 1]

model1 = keras_model_sequential() %>%
    layer_dense(units = 64, activation = "relu", input_shape = inputShape) %>%
    layer_dense(units = 256, activation = "relu") %>%
    layer_dense(units = prod(outputShape)) %>%
    layer_reshape(outputShape) %>%
    compile(loss = "mse", optimizer = "adam", metrics = list("mean_absolute_error", "mean_squared_error"))

model1 %>% summary
fit(model1, x_train, y_train, epochs = 3, validation_split = 0.2, verbose = 1)

model2 = keras_model_sequential() %>%
## tbd layer_input  --> layer_rehsape --> layer_dense (which seems to work best with non-matrix valued inputs
    layer_dense(units = 64, activation = "relu", input_shape = inputShape) %>%
    layer_dense(units = 256, activation = "relu") %>%
    layer_dense(units = outputShape[2]) %>%
# layer_dense(units = prod(outputShape)) %>%
# layer_reshape(outputShape) %>%
    compile(loss = "mse", optimizer = "adam", metrics = list("mean_absolute_error", "mean_squared_error"))

model2 %>% summary

fit(model2, x_train, y_train, epochs = 3, validation_split = 0.2, verbose = 1)

model1的摘要为

> model1 %>% summary
Model: "sequential_42"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #
================================================================================
dense_126 (Dense)                   (None, 10, 64)                  448
________________________________________________________________________________
dense_127 (Dense)                   (None, 10, 256)                 16640
________________________________________________________________________________
dense_128 (Dense)                   (None, 10, 30)                  7710
________________________________________________________________________________
reshape_29 (Reshape)                (None, 10, 3)                   0
================================================================================
Total params: 24,798
Trainable params: 24,798
Non-trainable params: 0
________________________________________________________________________________

模式2的形状是

> model2 %>% summary
Model: "sequential_43"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #
================================================================================
dense_129 (Dense)                   (None, 10, 64)                  448
________________________________________________________________________________
dense_130 (Dense)                   (None, 10, 256)                 16640
________________________________________________________________________________
dense_131 (Dense)                   (None, 10, 3)                   771
================================================================================
Total params: 17,859
Trainable params: 17,859
Non-trainable params: 0
________________________________________________________________________________

尽管两个模型都具有相同的输入和输出形状,但是model1无法与之一起训练

Error in py_call_impl(callable, dots$args, dots$keywords) :
  ValueError: in user code:

    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\keras\engine\training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    C:\Users\brandl\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\keras\engine\training.py:531 train_s

通过展平输入,它可以工作(请参见定义x_train的注释行)。但是,我想知道为什么我们不能在密集层中使用矩阵形状的输入值(或如何正确使用)?

注意:该示例使用https://keras.rstudio.com/编写,但由于它是1:1包装器API,因此我也很满意python的答案。

2 个答案:

答案 0 :(得分:1)

根据密集文档(https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense):

如果该层的输入的秩大于2,则Dense会沿着输入的最后一个轴和内核的轴1计算输入和内核之间的点积(使用tf.tensordot)

因此,如果输入张量具有形状(a,b,c)并且Dense层具有 d 单位,则输出张量具有形状 (a,b,d)。如果将张量穿过多个密集层,则只有最后一个尺寸会改变。

现在,如果代码以平展的x运行,则潜在的问题是形状不匹配。实际上,y_train似乎与网络的输出没有相同的尺寸。

据此

x_train = tf$constant(mx, shape = as.integer(c(n / nb, nb, 6)))
y_train = tf$constant(my, shape = as.integer(c(n / nb, nb, 3)))

x_trainy_train具有相同的尺寸,但最后一个尺寸除外。然后,为了使预测和y_train具有相同的维度,您的模型应以sth结尾,例如

layer_dense(units = outputShape[3]) %>%

代替

layer_dense(units = prod(outputShape)) %>%
layer_reshape(outputShape) %>%

但这只是技术方面。不知道从概念上讲,这就是您的追求。

答案 1 :(得分:0)

好吧,既然您有时间相关的数据集,为什么不尝试使用keras.layer.TimeDistributed API并查看您的数据序列是否按时间点排列?