在colab崩溃模型中未使用recurrent_dropout吗?

时间:2020-05-17 23:57:50

标签: python python-3.x keras google-colaboratory tensorflow2.0

我正在尝试训练一个简单的Tensorflow模型以检测推文的情绪。数组的数据类型和大小是一致的,并且当recurrent_dropout设置为某个float值时,模型训练得很好。但这会禁用cuDNN,我真的想加快速度(不是所有人),但是每当我删除重复的dropout参数时,模型训练就会在第一个时期结束之前崩溃。

下面是相关代码,我省略了导入,并加载了csv文件。相关代码后是最终输入尺寸和错误代码。此外,我已经弄清楚了为什么colab似乎正在削减培训数据。 Colab在将其分为批次后显示序列的数量,因此使用默认批次大小32时,我们将获得859个序列。不使用循环辍学时的崩溃问题仍然是一个问题。附带说明,这段代码是一个非常粗糙的草稿,其中的数据清理工作都是在同一笔记本中完成的,因此缺少典型的格式。

def remove_case(X):
    removed_case = []
    X = X.copy()
    for text in X:
        text = str(text).lower()
        removed_case.append(text)
    X = removed_case
    return X


def remove_hyperlinks(X):
    removed_hyperlinks = []
    X = X.copy()
    for text in X:
        text = str(text)
        text = re.sub(r'http\S+', '', text)
        text = re.sub(r'https\S+', '', text)
        text = re.sub(r'www\S+', '', text)
        removed_hyperlinks.append(text)
    X = removed_hyperlinks
    return X


def remove_punctuation(X):
    removed_punc = []
    X = X.copy()
    for text in X:
        text = str(text)
        text = "".join([char for char in text if char not in punctuation])
        removed_punc.append(text)
    X = removed_punc
    return X


def split_text(X):
    split_tweets = []
    X = X.copy()
    for text in X:
        text = str(text).split()
        split_tweets.append(text)
    X = split_tweets
    return X


def map_sentiment(X, l, m, n):
    keys = ['negative', 'neutral', 'positive']
    values = [l, m, n]
    dictionary = dict(zip(keys, values))
    X = X.copy()
    X = X.map(dictionary)
    return X


# # def sentiment_to_onehot(X):
#     sentiment_foofs = []
#     X = X.copy()
#     for integer in X:
#         if integer == "negative":  # Negative
#             integer = [1, 0, 0]
#         elif integer == "neutral":  # Neutral
#             integer = [0, 1, 0]
#         elif integer == "positive":  # Positive
#             integer = [0, 0, 1]
#         else:
#             break
#         sentiment_foofs.append(integer)
#     X = sentiment_foofs
#     return X


train_no_punc_lowercase = train.copy()
train_no_punc_lowercase['text'] = remove_case(train_no_punc_lowercase['text'])
train_no_punc_lowercase['text'] = remove_hyperlinks(train_no_punc_lowercase['text'])
train_no_punc_lowercase['text'] = remove_punctuation(train_no_punc_lowercase['text'])
train_no_punc_lowercase['sentiment'] = map_sentiment(train_no_punc_lowercase['sentiment'], 0, 1, 2)
train_no_punc_lowercase.head()

test_no_punc_lowercase = test.copy()
test_no_punc_lowercase['text'] = remove_case(test_no_punc_lowercase['text'])
test_no_punc_lowercase['text'] = remove_hyperlinks(test_no_punc_lowercase['text'])
test_no_punc_lowercase['text'] = remove_punctuation(test_no_punc_lowercase['text'])
test_no_punc_lowercase['sentiment'] = map_sentiment(test_no_punc_lowercase['sentiment'], 0, 1, 2)

features = train.columns.tolist()
features.remove('textID')  # all unique, high cardinality feature
features.remove('selected_text')  # target
target = 'selected_text'

X_train_no_punc_lowercase = train_no_punc_lowercase[features]
y_train_no_punc_lowercase = train_no_punc_lowercase[target]
X_test_no_punc_lowercase = test_no_punc_lowercase[features]


def stemming_column(df_column):
    ps = PorterStemmer()
    stemmed_word_list = []
    for i, string in enumerate(df_column):
        tokens = word_tokenize(string)
        new_string = ""
        for j, words in enumerate(tokens):
            new_string = new_string + ps.stem(words) + " "
        stemmed_word_list.append(new_string)
    return stemmed_word_list


def create_lookup_table(list1, list2):
    main_list = []
    lookup_dict = {}
    i = 1  # used to create a value in the dictionary
    main_list.append(list1)
    main_list.append(list2)
    for list in main_list:
        for string in list:
            for word in string.split():
                if word not in lookup_dict:
                    lookup_dict[word] = i
                    i += 1
    return lookup_dict


def encode(input_list, input_dict):
    encoded_list = []
    for string in input_list:
        sentence_list = []
        for word in string.split():
            sentence_list.append(input_dict[word])  # value lookup from dictionary.. int
        encoded_list.append(sentence_list)
    return encoded_list


def pad_data(list_of_lists):
    padded_data = tf.keras.preprocessing.sequence.pad_sequences(list_of_lists, padding='post')
    return padded_data


def create_array_sentiment_integers(list):
    sent_int_list = []
    for sentiment in list:
        sent_int_list.append(sentiment)
    return np.asarray(sent_int_list, dtype=np.int32)


X_train_stemmed_list = stemming_column(X_train_no_punc_lowercase['text'])
X_test_stemmed_list = stemming_column(X_test_no_punc_lowercase['text'])
lookup_table = create_lookup_table(X_train_stemmed_list, X_test_stemmed_list)

X_train_encoded_list = encode(X_train_stemmed_list, lookup_table)
X_train_padded_data = pad_data(X_train_encoded_list)

Y_train = create_array_sentiment_integers(train_no_punc_lowercase['sentiment'])
max_features = 3  # 3 choices 0, 1, 2

Y_train_final = np.zeros((Y_train.shape[0], max_features), dtype=np.float32)
Y_train_final[np.arange(Y_train.shape[0]), Y_train] = 1.0

input_dimension = len(lookup_table) + 1
output_dimension = 64
input_length = 33

model = Sequential()
model.add(tf.keras.layers.Embedding(input_dim=input_dimension,
                                    output_dim=output_dimension,
                                    input_length=input_length,
                                    mask_zero=True))

model.add(tf.keras.layers.LSTM(512, dropout=0.2, recurrent_dropout=0.2, return_sequences=True))
model.add(tf.keras.layers.Dense(256, activation='sigmoid'))

model.add(tf.keras.layers.Dropout(0.2))
model.add(Dense(3, activation='softmax'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(X_train_padded_data, Y_train_final, validation_split=0.20, epochs=10)

model.save('Tweet_sentiment.model')

此外,这是数据集的形状。

x train shape:  (27481, 33, 1) x train type:  <class 'numpy.ndarray'> y train shape:  (27481, 3)

错误代码

Epoch 1/3
363/859 [===========>..................] - ETA: 9s - loss: 0.5449 - accuracy: 0.5674
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-103-1d4af3962607> in <module>()
----> 1 model.fit(X_train_padded_data, Y_train_final, epochs=3,)

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

UnknownError:  [_Derived_]  CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1496): 'cudnnSetRNNDataDescriptor( data_desc.get(), data_type, layout, max_seq_length, batch_size, data_size, seq_lengths_array, (void*)&padding_fill)'
     [[{{node cond_38/then/_0/CudnnRNNV3}}]]
     [[sequential_5/lstm_4/StatefulPartitionedCall]] [Op:__inference_train_function_36098]

Function call stack:
train_function -> train_function -> train_function

1 个答案:

答案 0 :(得分:0)

我在您的代码中看到了一些问题。它们在下面提到:

  • 您正在使用input_dimension = len(lookup_table) + 1len(lookup_table)只是Number of Time Steps。它的价值将非常高,至少超过30,000。建议仅使用这些值的子集。因此,您可以设置input_dimension = 10000input_dimension = 15000(您可以尝试使用此值),它应该可以解决问题。话虽如此,它不会影响模型的准确性。

  • 为什么将 Recurrent Dropout 设置为浮点型有效值==>当我们设置Recurrent Dropout时,它实际上会丢弃Number of Time Steps,{{ 1}},因此不会崩溃。

  • 只有在input_dimension之后有另一个return_sequences=True时,才应使用LSTM Layer。由于您只有一个LSTM Layer,因此LSTM Layer应该设置为return_sequences
  • 由于您有3个班级,因此不应使用False。如果您不是binary_crossentropy的{​​{1}},则应使用sparse_categorical_crossentropy;如果您是 One-Hot-Encoding Target,则应使用categorical_crossentropy
  • 您确定要在One-Hot-Encoding中使用Target吗?

此外,我看到您正在使用Masking的许多功能和多行代码,例如删除Embedding Layer,删除Data-PreprocessingHyperlinks等。

所以,我想我会为 Punctuations 提供 Tokenizing ,这对您和 {{1 }} 。相同的代码如下所示:

End-To-End Tutorial

有关更多信息,请参阅此Beautiful Article

希望这可以解决您的问题。学习愉快!