数据类型,数据形状和pad_sequences

时间:2017-05-29 08:44:11

标签: python keras

我无法理解我在此代码中收到的错误消息。 x_train部分来自一个显示如何在Keras中使用LSTM的工作示例。

mytrain的部分只是我正在玩的一个例子来理解各种功能。

正如您从消息中看到的那样,x_trainmytrain具有相同的类型和形状。

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
import numpy as np

max_features = 80
maxlen = 5

# from the example
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print('x_train type: ', type(x_train))
print('x_train shape:', x_train.shape)
sequence.pad_sequences(x_train, maxlen=maxlen)

# my test code
mytrain = np.ones_like(x_train)
print('mytrain type:', type(mytrain))
print('mytrain shape:', mytrain.shape)
mytrain2 = sequence.pad_sequences(mytrain, maxlen=maxlen)

输出:

D:\python\python.exe D:/workspace/YYYY/test/test_sequences.py
Using TensorFlow backend.
x_train type:  <class 'numpy.ndarray'>
x_train shape: (25000,)
Traceback (most recent call last):
  File "D:/workspace/YYYY/test/test_sequences.py", line 22, in <module>
    mytrain2 = sequence.pad_sequences(mytrain, maxlen=10)
  File "D:\python\lib\site-packages\keras\preprocessing\sequence.py", line 42, in pad_sequences
    'Found non-iterable: ' + str(x))
mytrain type: <class 'numpy.ndarray'>
ValueError: `sequences` must be a list of iterables. Found non-iterable: 1
mytrain shape: (25000,)

如果我使用mytrain = np.asarray([[1, 2, 3]])(可迭代列表),它会起作用,但我无法理解前一代码中x_trainmytrain之间的差异。

1 个答案:

答案 0 :(得分:2)

<强>问题:

当您打印x_train时,您会得到:

[ [1, 14, 22, 16, 43, 2, 2, 2, 2, 65, 2, 2, 66, 2, 4, 2, 36, 2, 5, 25, 2, 43, 2, 2, 50, 2, 2, 9, 35, 2, 2, 5, 2, 4, 2, 2, 2, 2, 2, 2, 39, 4, 2, 2, 2, 17, 2, 38, 13, 2, 4, 2, 50, 16, 6, 2, 2, 19, 14, 22, 4, 2, 2, 2, 4, 22, 71, 2, 12, 16, 43, 2, 38, 76, 15, 13, 2, 4, 22, 17, 2, 17, 12, 16, 2, 18, 2, 5, 62, 2, 12, 8, 2, 8, 2, 5, 4, 2, 2, 16, 2, 66, 2, 33, 4, 2, 12, 16, 38, 2, 5, 25, 2, 51, 36, 2, 48, 25, 2, 33, 6, 22, 12, 2, 28, 77, 52, 5, 14, 2, 16, 2, 2, 8, 4, 2, 2, 2, 15, 2, 4, 2, 7, 2, 5, 2, 36, 71, 43, 2, 2, 26, 2, 2, 46, 7, 4, 2, 2, 13, 2, 2, 4, 2, 15, 2, 2, 32, 2, 56, 26, 2, 6, 2, 2, 18, 4, 2, 22, 21, 2, 2, 26, 2, 5, 2, 30, 2, 18, 51, 36, 28, 2, 2, 25, 2, 4, 2, 65, 16, 38, 2, 2, 12, 16, 2, 5, 16, 2, 2, 2, 32, 15, 16, 2, 19, 2, 32]
 ...,
 [1, 17, 6, 2, 2, 7, 4, 2, 22, 45, 2, 8, 2, 14, 2, 4, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 39, 14, 2, 4, 2, 9, 2, 50, 2, 12, 47, 4, 2, 5, 2, 7, 38, 2, 2, 2, 7, 4, 2, 2, 9, 24, 6, 78, 2, 17, 2, 2, 21, 27, 2, 2, 5, 2, 2, 2, 2, 4, 2, 7, 4, 2, 42, 2, 2, 35, 2, 2, 29, 2, 27, 2, 8, 2, 12, 2, 21, 2, 2, 9, 6, 66, 78, 2, 4, 2, 2, 5, 2, 2, 2, 2, 6, 2, 8, 2, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 2, 8, 2, 2, 2, 21, 60, 27, 2, 9, 43, 2, 2, 2, 10, 10, 12, 2, 40, 4, 2, 20, 12, 16, 5, 2, 2, 72, 7, 51, 6, 2, 22, 4, 2, 2, 9]]

每个元素都是一个列表。虽然mytrain是:

[1 1 1 ..., 1 1 1]

这只是一个整数列表。

<强> SOLUTION:

这应该可以满足您的需求:

mytrain = []
for i in range(0,x_train.shape[0]):
    mytrain.append(np.ones(len(x_train[i])))
mytrain = np.asarray(mytrain)

事实上:

('x_train type: ', <type 'numpy.ndarray'>)
('x_train shape:', (25000,))
('mytrain type:', <type 'numpy.ndarray'>)
('mytrain shape:', (25000,))