Question

我有一个类似于以下内容的PyTorch脚本：

# Loading data
train_loader, test_loader = someDataLoaderFunction()

# Define the architecture
model = ResNet18()
model = model.cuda()  

# Get method from program argument
method = args.method

# Training
train(method, model, train_loader, test_loader)

为了使用两种不同的方法（method1和method2）运行脚本，只需在两个不同的终端中运行以下命令即可：

CUDA_VISIBLE_DEVICES=0 python program.py --method method1
CUDA_VISIBLE_DEVICES=1 python program.py --method method2

问题是，上面的数据加载器函数中包含一些随机性，这意味着这两种方法应用于两组不同的训练数据。我希望他们训练完全相同的数据集，所以我修改了脚本如下：

# Loading data
train_loader, test_loader = someDataLoaderFunction()

# Define the architecture
model = ResNet18()
model = model.cuda()  

## Run for the first method
method = 'method1'

 # Training
train(method, model, train_loader, test_loader)

## Run for the second method
method = 'method2'

# Must re-initialize the network first
model = ResNet18()
model = model.cuda()

 # Training
train(method, model, train_loader, test_loader)

是否可以为每种方法并行运行？非常感谢你的帮助！

Answer 1

我想最简单的方法就是修复下面的种子。

myseed=args.seed
np.random.seed(myseed)
torch.manual_seed(myseed)
torch.cuda.manual_seed(myseed)

这应该强制数据加载器每次都获得相同的样本。平行的方式是使用多线程，但我几乎看不到你发布的问题的麻烦。

在两个不同的GPU上并行运行部分Python代码

1 个答案: