我有一个脚本,正在使用Keras初始化和训练神经网络,因此我在测试和优化代码时为100%可再现的训练结果初始化了随机种子。同时,我还一直在使用importlib.reload()函数重新加载我在开发过程中要更改的所有自定义模块。但是问题是我在ipython会话中第一次调用脚本时的随机顺序可能与随后的时间不同。我找到了一个解决方案(请参阅下文),但看起来笨拙而笨拙。我想知道是否有一种更有效或更Python的方式。还是我应该完全不同地处理随机种子?
import sys
modulesUnderDevelopment = ['common_config', 'cnn_tools', 'prep_dataset']
moduleShorthand = ['cc', 'ct', 'pd']
reloadTracker = dict()
#record which modules have not yet been imported, before we do any importing.
for moduleName, shorthand in zip(modulesUnderDevelopment, moduleShorthand):
reloadTracker[moduleName] = {'imported':moduleName in sys.modules, 'shorthand': shorthand}
#import modules. Anything that's already been imported won't be imported by these lines,
# so any randomness that occurs during import will NOT occur if the module was
# already imported. HOWEVER, if the module has not been imported, any randomness
# in the module import will be executed.
print('imports are beginning')
import common_config as cc
import cnn_tools as ct
import prep_dataset as pd
print('imports are ending')
import importlib
#now reload the key modules that I'm optimizing and regularly changing. These reloads will
# ALWAYS happen, so the randomness in them will always be executed.
print('reloads are beginning')
for key, item in reloadTracker.items():
if not item['imported']:
continue
if key == 'common_config':
importlib.reload(cc)
elif key == 'cnn_tools':
importlib.reload(ct)
elif key == 'prep_dataset':
importlib.reload(pd)
print('reloads are complete')
同时,我的模块包含如下代码:
## prep_dataset ##
import numpy as np
print('starting prep_dataset -- {}'.format(np.random.random()))
## cnn_tools ##
import numpy as np
print('starting cnn_tools -- {}'.format(np.random.random()))
我第一次启动ipython并运行此脚本时,将打印以下内容:
imports are beginning
starting cnn_tools -- 0.5507979025745755
Using TensorFlow backend.
starting prep_dataset -- 0.7081478226181048
imports are ending
reloads are beginning
reloads are complete
随后,它将显示以下内容:
imports are beginning
imports are ending
reloads are beginning
starting cnn_tools -- 0.5507979025745755
starting prep_dataset -- 0.7081478226181048
reloads are complete
所以这解决了我的问题-随机数流两次都相同。但是,它看起来确实很笨拙……相反,如果我不经历所有这些麻烦,仅在导入后重新加载,我第一次运行脚本时会得到不同的结果-流中有2个额外的随机数。
starting cnn_tools -- 0.5507979025745755
Using TensorFlow backend.
starting prep_dataset -- 0.7081478226181048
reloads are beginning
starting cnn_tools -- 0.2909047389129443
starting prep_dataset -- 0.510827605197663
reloads are complete