AttributeError:' str'对象没有属性  ' ndim'

时间:2018-02-07 18:34:44

标签: python string tensorflow deep-learning keras

我使用Keras实现情感分析代码。我的培训数据如下:

  • pos.txt:由行分隔的所有正面评论的文本文件
  • neg.txt:由行分隔的所有负面评论的文本文件

我以与here

类似的方式构建我的代码

唯一的区别是他们的数据是从Keras数据集导入的,而我的数据是文本文件

这是我的代码

# CNN for the IMDB problem

top_words = 5000

pos_file=open('pos.txt', 'r')
neg_file=open('neg.txt', 'r')
 # Load data from files
 pos = list(pos_file.readlines())
 neg = list(neg_file.readlines())
 x = pos + neg
 total = numpy.array(x)
 # Generate labels
 positive_labels = [1 for _ in pos]
 negative_labels = [0 for _ in neg]
 y = numpy.concatenate([positive_labels, negative_labels], 0)

 #Testing
 pos_test=open('posTest.txt', 'r')
 posT = list(pos_test.readlines())
 print("pos length is",len(posT))

 neg_test=open('negTest.txt', 'r')
 negT = list(neg_test.readlines())
 xTest = pos + negT
 total2 = numpy.array(xTest)

# Generate labels
positive_labels2 = [1 for _ in posT]
negative_labels2 = [0 for _ in negT]
yTest = numpy.concatenate([positive_labels2, negative_labels2], 0)

#Create model
max_words = 1
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_words))

model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=1))
model.add(Flatten())
model.add(Dense(250, activation='relu'))

model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

#Fit the model

model.fit(total, y, validation_data=(xTest, yTest), epochs=2, batch_size=128, verbose=2)

# Final evaluation of the model
scores = model.evaluate(total2, yTest, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

当我运行我的代码时,我收到此错误

File "C:\Users\\Anaconda3\lib\site-packages\keras\engine\training.py", line 70, in <listcomp>
data = [np.expand_dims(x, 1) if x is not None and x.ndim == 1 else x for x in data]

AttributeError: 'str' object has no attribute 'ndim'

1 个答案:

答案 0 :(得分:2)

您正在向模型提供字符串列表,这是它不期望的。您可以使用keras.preprocessing.text模块将文本转换为整数序列。更具体地说,您可以准备以下数据:

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
tk = Tokenizer()
tk.fit_on_texts(texts)
index_list = tk.texts_to_sequences(texts)
x_train = pad_sequences(index_list, maxlen=maxlen)

现在x_train(类型为n_samples * maxlen的{​​{1}} ndarray)是该模型的合法输入。