双向LSTM给出的损耗为NaN

时间:2020-07-29 17:09:24

标签: python tensorflow keras deep-learning lstm

我正在使用Twitter的emotion dataset对情绪进行分类。为了实现这一点,我写了下面的代码,但是当我训练它时,我会损失NaN。我无法理解问题所在。尽管我设法找到了解决问题的方法,但是为什么一开始就出现了问题,我却没明白。

代码:

import pandas as pd
import numpy as np
import re

cols = ["id","text","emotion","intensity"]

anger_df_train = pd.read_csv("D:/Dataset/twitter_emotion/train/anger.csv",delimiter='\t',names=cols)
fear_df_train = pd.read_csv("D:/Dataset/twitter_emotion/train/fear.csv",delimiter='\t',names=cols)
joy_df_train = pd.read_csv("D:/Dataset/twitter_emotion/train/joy.csv",delimiter='\t',names=cols)
sadness_df_train = pd.read_csv("D:/Dataset/twitter_emotion/train/sadness.csv",delimiter='\t',names=cols)

df_train = pd.concat([anger_df_train,fear_df_train,joy_df_train,sadness_df_train])

import spacy

nlp = spacy.load('en_core_web_md')

doc = nlp("The big grey dog ate all of the chocolate, but fortunately he wasn't sick!")

def spacy_tokenizer(sentence):
    emails = '[A-Za-z0-9]+@[a-zA-Z].[a-zA-Z]+'
    websites = '(http[s]*:[/][/])[a-zA-Z0-9]'
    mentions = '@[A-Za-z0-9]+'
    sentence = re.sub(emails,'',sentence)
    sentence = re.sub(websites,'',sentence)
    sentence = re.sub(mentions,'',sentence)
    sentence_list=[word.lemma_ for word in nlp(sentence) if not (word.is_stop or word.is_space or word.like_num or len(word)==1)]
    return ' '.join(sentence_list)

import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

df_train['new_text']=df_train['text'].apply(spacy_tokenizer)

tokenizer = Tokenizer(num_words=10000)

tokenizer.fit_on_texts(df_train['new_text'].values)

sequences = tokenizer.texts_to_sequences(df_train['new_text'].values)

text_embedding = np.zeros((len(word_index)+1,300))

for word,i in word_index.items():
    text_embedding[i]=nlp(word).vector

labels = df_train['emotion'].unique()

label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(labels)

train_label = np.array(label_tokenizer.texts_to_sequences(df_train['emotion'].values))

train_padd = pad_sequences(sequences,maxlen=maxlen,padding='post',truncating='post')

embedding_dim=300

from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras.layers import Embedding, Flatten, Dense
from tensorflow.keras.layers import Dense, LSTM, Embedding,Dropout,SpatialDropout1D,Conv1D,MaxPooling1D,GRU,BatchNormalization
from tensorflow.keras.layers import Input,Bidirectional,GlobalAveragePooling1D,GlobalMaxPooling1D,concatenate,LeakyReLU

model=models.Sequential()

model.add(Embedding(input_dim=text_embedding.shape[0],output_dim=text_embedding.shape[1],weights=[text_embedding],input_length=maxlen,trainable=False))

model.add(Bidirectional(LSTM(embedding_dim)))
model.add(Dense(embedding_dim,activation='tanh'))
model.add(Dense(4,activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

X_train,X_test,y_train,y_test=train_test_split(train_padd,train_label,test_size=0.2)

num_epochs = 10
history = model.fit(X_train, y_train,epochs=num_epochs,verbose=1)

培训:

Epoch 1/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0035
Epoch 2/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 3/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 4/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 5/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 6/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 7/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 8/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 9/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 10/10
91/91 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.0000e+00

解决该问题的方法,而不是使用 tokenizer 来标记我在以下命令中使用的标签。

label_map={'anger': 0, 'fear': 1, 'joy': 2, 'sadness': 3}
df_train['emotion'] = df_train['emotion'].map(label_map)
from tensorflow.keras.utils import to_categorical
categorical_labels = to_categorical(df_train['emotion'].values,num_classes=4)

X_train,X_test,y_train,y_test=train_test_split(train_padd,categorical_labels,test_size=0.2,shuffle=True)

在编译模型时,我也将sparse_categorical_crossentropy更改为categorical_crossentropy。 之后,它起作用了。 请指导我我的方法出了什么问题。

1 个答案:

答案 0 :(得分:1)

sparse_categorical_crossentropy产生NaN的原因是当我们Tokenize the Labels使用Tokenizer生成的Train Labels Array时,如下所示:

array([[1],
       [2],
       [3],
       [4],
       [1],
       [2],
       [3]])

但是,如果必须应用sparse_categorical_crossentropy损失,则Train Labels Array应该如下所示:

array([0, 1, 2, 3, 0, 1, 2])

因此,我们可以使用以下代码使您的代码处理sparse_categorical_crossentropy丢失:

label_map={'anger': 0, 'fear': 1, 'joy': 2, 'sadness': 3}
df['Labels'] = df['Labels'].map(label_map)

sparse_categorical_labels = df['Labels'].values

X_train,X_test,y_train,y_test =  train_test_split(train_padd,sparse_categorical_labels,test_size=0.2,shuffle=True)