我有一个用于预测市场情绪的tensorflow LSTM模型。我用最大序列长度150构建模型。(最大单词数) 在做出预测时,我编写了如下代码:
batchSize = 32
maxSeqLength = 150
def getSentenceMatrix(sentence):
arr = np.zeros([batchSize, maxSeqLength])
sentenceMatrix = np.zeros([batchSize,maxSeqLength], dtype='int32')
cleanedSentence = cleanSentences(sentence)
cleanedSentence = ' '.join(cleanedSentence.split()[:150])
split = cleanedSentence.split()
for indexCounter,word in enumerate(split):
try:
sentenceMatrix[0,indexCounter] = wordsList.index(word)
except ValueError:
sentenceMatrix[0,indexCounter] = 399999 #Vector for unkown words
return sentenceMatrix
input_text = "example data"
inputMatrix = getSentenceMatrix(input_text)
在代码中,我将输入文本截断为150个单词,并忽略了剩余数据。
cleanedSentence = ' '.join(cleanedSentence.split()[:150])
由于这个原因,我的预测是错误的。有人可以帮我解决这个问题吗?