Keras顺序模型拟合Vectorizer形状

时间:2020-06-14 10:10:18

标签: python keras

我正在尝试使用以下代码训练模型。感谢您对理解形状的任何帮助。

'' split the sample '''
X_train, X_test, y_train, y_test = train_test_split(csvFile['Tweet'], csvFile['sent_score'], test_size= 0.20, random_state=1000)
'''add layers to define the input dimension of our feature vectors'''

vectorizer = CountVectorizer()
vectorizer.fit(X_train)
X_train = vectorizer.transform(X_train)
X_test  = vectorizer.transform(X_test)

model = Sequential()
#model.add(Reshape((X_train.shape[1],1)))
#model.add(layers.Dense(10, input_dim=(X_train.shape[1],1), activation='relu'))
model.add(layers.Dense(10, input_dim=1, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))


''' specifies the optimizer and the loss function '''
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

X_train = np.asarray(X_train)
y_train = np.asarray(y_train)

''' fit the model '''
history = model.fit(X_train, y_train, epochs=10)

错误:

Traceback (most recent call last):
  File "test.py", line 212, in <module>
    history = model.fit(X_train, y_train, epochs=10)#, verbose=False, validation_data=(X_test, y_test), batch_size=10)
  File "/Users/delalma/Library/Python/3.7/lib/python/site-packages/keras/engine/training.py", line 952, in fit
    batch_size=batch_size)
  File "/Users/delalma/Library/Python/3.7/lib/python/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
    exception_prefix='input')
  File "/Users/delalma/Library/Python/3.7/lib/python/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking input: expected dense_1_input to have 2 dimensions, but got array with shape ()

X_test:

  (0, 5846) 1
  (0, 6091) 1
  (0, 6921) 1
  (0, 7450) 1
  (0, 7682) 1
  (0, 8581) 1
  (0, 9885) 1
  (0, 13119)    1
  (0, 21322)    1
  (0, 21816)    1
  (0, 23101)    1

y_test:

9183    -1
16444    1
12410    1
7879     1
17775    1
        ..
15611    1
3776     1
6215    -1
4695     1
9651     1
Name: sent_score, Length: 23136, dtype: int64

我知道形状需要为二维数组,但是矢量化的X_test返回形状(),我不知道如何处理该错误。

1 个答案:

答案 0 :(得分:1)

这里是一个完整的工作示例

X_train = ['hello adsa sdajka', 'sadk asoa nasdl', 'adk dsao dsak ksasad']
X_test = ['sajjk hello dlsl sdajka', 'daskk sadk']

y_train = [0,1,1]
y_test = [1,0]

vectorizer = CountVectorizer()
vectorizer.fit(X_train)
X_train = vectorizer.transform(X_train).toarray() # remember toarray()
X_test  = vectorizer.transform(X_test).toarray() # remember toarray()

model = Sequential()
model.add(layers.Dense(10, input_dim=X_train.shape[-1], activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

y_train = np.asarray(y_train)
y_test = np.asarray(y_test)

history = model.fit(X_train, y_train, epochs=10)

您的神经网络接收一个暗淡数组(n_sample,n_features)。输入的dim等于n_features。使用countvectorizer时,功能只是令牌计数的矩阵,因此暗淡为(n_sample,n_train的单词)。