python-3.x - Keras机器学习项目中随机性的常见来源是什么？ - Thinbug

Keras机器学习项目中随机性的常见来源是什么？

时间：2018-08-06 21:01:40

标签： python-3.x machine-learning scikit-learn keras reproducible-research

可重复性很重要。在一个开源机器学习项目中，我目前正在努力实现这一目标。有什么要看的部分？

1 个答案:

答案 0 :(得分：3)

设置种子

计算机具有伪随机数生成器，这些伪随机数生成器使用称为种子的值进行初始化。对于机器学习，您可能需要执行以下操作：

# I've heard the order here is important
import random
random.seed(0)

import numpy as np
np.random.seed(0)

import tensorflow as tf
tf.set_random_seed(0)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
                              inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

from keras import backend as K
K.set_session(sess)  # tell keras about the seeded session

# now import keras stuff

另请参阅：Keras FAQ: How can I obtain reproducible results using Keras during development?

sklearn

sklearn.model_selection.train_test_split有一个random_state参数。

要检查的内容

我每次都以相同的顺序加载数据吗？
我是否以相同的方式初始化模型？
您是否使用可能会更改的外部数据？
您是否使用可能会改变的外部状态（例如datetime.now）？