以前,我提到在CNTK顺序机器学习模型的Branscripts中没有可用于定义种子值的选项[1]。因此,我将我的代码迁移到Python API(CNTK),在定义顺序机器学习模型的种子值时,它提供了更细粒度的选项。下面是我在实现中使用随机初始化的实例(并设置相应的种子值)
// CNTK导入
import numpy as np
import pandas as pd
import random
import math as m
from cntk.device import *
from cntk import Trainer
from cntk.layers import *
import cntk
import cntk.ops as o
import cntk.layers as l
//定义随机种子
np.random.seed(8888)
random.seed(8888)
//定义输入和输出训练向量
input_array_df = np.asarray(input_split_df[1:len(input_split_df)], dtype=np.float32)
output_array_df = np.asarray(output_df_df[1:len(output_df_df)], dtype=np.float32)
tup=(input_array_df, output_array_df)
listOfTuplesOfInputsLabels.append(tup)
//改组输入向量
random.shuffle(listOfTuplesOfInputsLabels)
//定义顺序模型
num_minibatches = len(features) // minibatch_size
epoch_size = len(features)*1
feature = o.input_variable((input_dim),np.float32)
label = o.input_variable((output_dim),np.float32)
netout=Sequential([For(range(1), lambda i: Recurrence(LSTM(lstm_cell_dimension,use_peepholes=LSTM_USE_PEEPHOLES,init=glorot_uniform(seed=8888)))),Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=8888))])(feature)
learner = momentum_sgd(netout.parameters, lr = learning_rate_schedule([(4,0.003),(16,0.002)], unit=UnitType.sample,epoch_size=epoch_size),
momentum=momentum_as_time_constant_schedule(minibatch_size / -m.log(0.9)), gaussian_noise_injection_std_dev = gaussian_noise,l2_regularization_weight =l2_regularization_weight)
//分裂成小批量
tf = np.array_split(features,num_minibatches)
tl = np.array_split(labels,num_minibatches)
//训练
features = np.ascontiguousarray(tf[i%num_minibatches])
labels = np.ascontiguousarray(tl[i%num_minibatches])
trainer.train_minibatch({feature : features, label : labels})
不幸的是,即使我能够在我的代码中成功定义种子值,我仍然可以在最终结果中观察到一些较小的变化。这是因为浮点计算吗?或者你能在我的代码中找到我应该设置种子值的任何内容,我还没有完成它吗?
谢谢!
[1] Defining a seed value in Branscripts for CNTK sequential machine learning models
答案 0 :(得分:0)
你可以试试下面的内容:
from _cntk_py import set_fixed_random_seed, force_deterministic_algorithms
set_fixed_random_seed(1)
force_deterministic_algorithms()