如何解决keras fit函数错误“所有输入数组(x)应该具有相同数量的样本”?

时间:2019-10-28 05:47:47

标签: python keras autoencoder data-integration

我以example in stackblitz为例。 该代码一直工作到示例的第三部分为止,在该部分中应该训练自动编码器。由于我是keras模型的新手,所以我基本上只是复制和粘贴代码,所以我不知道网站上的代码是如何工作的,而我的则不行。

我尝试将健身功能从更改为

estimator = autoencoder.fit([X_scRNAseq, X_scProteomics],
                            [X_scRNAseq, X_scProteomics],
                            epochs = 100, batch_size = 128,
                            validation_split = 0.2, shuffle = True, verbose = 1)

estimator = autoencoder.fit([X_scRNAseq, X_scRNAseq],
                            [X_scRNAseq, X_scRNAseq],
                            epochs = 100, batch_size = 128,
                            validation_split = 0.2, shuffle = True, verbose = 1)

为了解决相同数量的样本问题,它可以正常工作,但是并不能按照预期的方式训练自动编码器。

X_scRNAseq和X_scProteomics都是numpy数组,分别具有(36280,8617)和(13,8617)形状。 模型摘要为:

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
scRNAseq (InputLayer)           (None, 8617)         0                                            
__________________________________________________________________________________________________
scProteomics (InputLayer)       (None, 8617)         0                                            
__________________________________________________________________________________________________
Encoder_scRNAseq (Dense)        (None, 50)           430900      scRNAseq[0][0]                   
__________________________________________________________________________________________________
Encoder_scProteomics (Dense)    (None, 10)           86180       scProteomics[0][0]               
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 60)           0           Encoder_scRNAseq[0][0]           
                                                                 Encoder_scProteomics[0][0]       
__________________________________________________________________________________________________
Bottleneck (Dense)              (None, 50)           3050        concatenate_1[0][0]              
__________________________________________________________________________________________________
Concatenate_Inverse (Dense)     (None, 60)           3060        Bottleneck[0][0]                 
__________________________________________________________________________________________________
Decoder_scRNAseq (Dense)        (None, 8617)         525637      Concatenate_Inverse[0][0]        
__________________________________________________________________________________________________
Decoder_scProteomics (Dense)    (None, 8617)         525637      Concatenate_Inverse[0][0]        
==================================================================================================
Total params: 1,574,464
Trainable params: 1,574,464
Non-trainable params: 0
__________________________________________________________________________________________________

当我尝试应用fit函数时遇到的错误是:

ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(36280, 8617), (13, 8617)]

谢谢!

1 个答案:

答案 0 :(得分:1)

Keras希望输入数据的第一个轴是样本数。如您所说,X_scRNAseq的形状为(36280, 8617)X_scProteomics的形状为(13, 8617)。 Keras希望第一个轴是样本数,在这种情况下不正确。

我认为,解决方案是像这样重塑X_scRNAseqX_scProteomics

X_scRNAseq = np.swapaxes(X_scRNAseq, 0, 1)   #(8617, 36280)
X_scProteomics = np.swapaxes(X_scProteomics, 0, 1)  #(8617, 13)

然后,适合您的模型:

estimator = autoencoder.fit([X_scRNAseq, X_scProteomics],
                            [X_scRNAseq, X_scProteomics],
                            epochs = 100, batch_size = 128,
                            validation_split = 0.2, shuffle = True, verbose = 1)