我有run_command_list.txt
,每行禁止一个命令:
time python3 train.py --dataroot ./datasets/maps --name maps_pix2pix --model pix2pix --direction AtoB --checkpoints_dir maps_pix2pix_a_to_b_bs_1 --batch_size 1 > bs_1.log
time python3 train.py --dataroot ./datasets/maps --name maps_pix2pix --model pix2pix --direction AtoB --checkpoints_dir maps_pix2pix_a_to_b_bs_2 --batch_size 2 > bs_2.log
time python3 train.py --dataroot ./datasets/maps --name maps_pix2pix --model pix2pix --direction AtoB --checkpoints_dir maps_pix2pix_a_to_b_bs_4 --batch_size 4 > bs_4.log
...
我想并行运行不超过2个作业,并且我想根据当前可用的GPU设置CUDA_VISIBLE_DEVICES = 0或CUDA_VISIBLE_DEVICES = 1,如何使用parallel
或{{ 1}}?
即像xargs
答案 0 :(得分:1)
seq 1000 |
parallel -j2 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' time python3 train.py --dataroot ./datasets/maps --name maps_pix2pix --model pix2pix --direction AtoB --checkpoints_dir maps_pix2pix_a_to_b_bs_{} --batch_size {} '>' bs_{}.log
答案 1 :(得分:0)
您可以这样做:
function GET_AVAILABLE_DEVICE() {
[[ SOMETHING_HERE == SOMETHING ]] && echo 0 || echo 1
}
CUDA_VISIBLE_DEVICES=$( GET_AVAILABLE_DEVICE ) time python3 train.py --dataroot ./datasets/maps --name maps_pix2pix --model pix2pix --direction AtoB --checkpoints_dir maps_pix2pix_a_to_b_bs_1 --batch_size 1 > bs_1.log &
CUDA_VISIBLE_DEVICES=$( GET_AVAILABLE_DEVICE ) time python3 train.py --dataroot ./datasets/maps --name maps_pix2pix --model pix2pix --direction AtoB --checkpoints_dir maps_pix2pix_a_to_b_bs_2 --batch_size 2 > bs_2.log &
CUDA_VISIBLE_DEVICES=$( GET_AVAILABLE_DEVICE ) time python3 train.py --dataroot ./datasets/maps --name maps_pix2pix --model pix2pix --direction AtoB --checkpoints_dir maps_pix2pix_a_to_b_bs_4 --batch_size 4 > bs_4.log &
wait
您需要用任何会为您提供可用设备的命令替换 SOMETHING_HERE == SOMETHING 。