我在tutorial之后使用深度神经网络在keras中进行文本分类,但是当我多次运行以下代码时,得到了不同的结果。
例如,第一轮的测试损失为0.88815,第二轮的测试损失为0.89030,该损失略高。我想知道随机性是哪里来的吗?
import keras
from keras.datasets import reuters
(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None, test_split=0.2)
word_index = reuters.get_word_index(path="reuters_word_index.json")
print('# of Training Samples: {}'.format(len(x_train)))
print('# of Test Samples: {}'.format(len(x_test)))
num_classes = max(y_train) + 1
print('# of Classes: {}'.format(num_classes))
index_to_word = {}
for key, value in word_index.items():
index_to_word[value] = key
print(' '.join([index_to_word[x] for x in x_train[0]]))
print(y_train[0])
from keras.preprocessing.text import Tokenizer
max_words = 10000
tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print(x_train[0])
print(len(x_train[0]))
print(y_train[0])
print(len(y_train[0]))
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.metrics_names)
batch_size = 32
epochs = 3
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_split=0.1)
score = model.evaluate(x_test, y_test, batch_size=batch_size, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
答案 0 :(得分:1)
如果要每次都获得相同的结果,则需要添加一个随机种子。另请参见https://machinelearningmastery.com/reproducible-results-neural-networks-keras/。
这可以通过添加以下内容来完成:
from numpy.random import seed
seed(42)
如果您使用的是Tensorflow后端,则还需要添加:
from tensorflow import set_random_seed
set_random_seed(42)
42只是您可以随意选择的任意数字。对于随机种子而言,这只是一个常数,因此您将始终获得相同的权重随机初始化。然后,这将使您获得相同的结果。
答案 1 :(得分:1)
这是喀拉拉邦的惯常行为。请参阅github的keras存储库问题列表中的this discussion。
例如,在fit function中,第9个参数是重排。默认情况下,它设置为true。因此,在每个纪元中,数据将在运行前被重新排序。这会导致该值每次都更改。
设置随机种子会有所帮助。但是,仍然不完全是这样。
答案 2 :(得分:0)
如Keras FAQ中所述,添加以下代码:
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/keras-team/keras/issues/2280#issuecomment-306959926
import os
os.environ['PYTHONHASHSEED'] = '0'
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds have-to-be-set-where-to-realize-100-reproducibility-of-training-res
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
from keras import backend as K
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
# Rest of code follows ...
答案 3 :(得分:0)
我不检查 GPU,但检查 CPU 似乎无法像上面那样使用 Tensorflow 1 作为 Keras 后端修复种子。因此,我们需要将 Tensorflow 1 更改为 Tensorflow 2。然后,固定种子将起作用。例如,这对我有用。
import os
import numpy as np
import random as rn
import tensorflow as tf
os.environ['PYTHONHASHSEED']= '0'
np.random.seed(1)
rn.seed(1)
tf.set_random_seed(1)