Question

我的程序计算时间非常长。我需要用不同的参数来调用它。我想在拥有大量处理器的服务器上运行它们，所以我想并行启动它们以节省时间。（一个程序实例仅使用一个处理器）

我尽力编写一个看起来像这样的bash脚本：

#!/bin/bash

# set maximal number of parallel jobs
MAXPAR=5
#  fill the PID array with nonsense pid numbers
for (( PAR=1; PAR<=MAXPAR; PAR++ ))
do
   PID[$PAR]=-18
done


# loop over the arguments
for ARG in 50 60 70 90
do
   # endless loop that checks, if one of the parallel jobs has finished
   while true
   do
      # check if PID[PAR] is still running, suppress error output of kill
      if ! kill -0 ${PID[PAR]} 2> /dev/null
      then
         # if PID[PAR] is not running, the next job
         # can run as parellel job number PAR
         break
      fi

      # if it is still running, check the next parallel job
      if [ $PAR -eq $MAXPAR ]
      then
         PAR=1
      else
         PAR=$[$PAR+1]
      fi

      # but sleep 10 seconds before going on
      sleep 10
   done

   # call to the actual program (here sleep for example)
   #./complicated_program $ARG &
   sleep $ARG &

   # get the pid of the process we just started and save it as PID[PAR]
   PID[$PAR]=$!

   # give some output, so we know where we are
   echo ARG=$ARG, par=$PAR, pid=${PID[PAR]}
done

现在，这个脚本有效，但我不太喜欢它。

有没有更好的方法来处理开头？（设置PID[*]=-18对我来说不合适）
如何在没有丑陋的无限循环和睡眠几秒钟的情况下等待第一份工作完成？我知道有wait，但我不知道如何在这里使用它。
我很感激有关如何改进风格和简洁性的任何意见。

Answer 1

我有一个更复杂的代码，或多或少，做同样的事情。你需要考虑的事情：

用户是否需要批准新线程的产生
用户是否需要批准杀死旧帖子
线程终止于它自己还是需要被杀死
用户是否希望脚本无限运行，只要它具有MAXPAR线程
如果是，用户是否需要转义序列以停止进一步产生

以下是一些代码：

    spawn()                              #function that spawns a thread
    {                                    #usage: spawn 1 ls -l
        i=$1                             #save the thread index
        shift 1                          #shift arguments to the left
        [ ${thread[$i]} -ne 0 ] &&       #if the thread is not already running
        [ ${#thread[@]} -lt $threads] && #and if we didn't reach maximum number of threads,
        $@ &                             #run the thread in the background, with all the arguments
        thread[$1]=$!                    #associate thread id with thread index
    }

    terminate()                          #function that terminates threads
    {                                    #usage: terminate 1
        [ your condition ] &&            #if your condition is met,
        kill {thread[$1]} &&             #kill the thread and if so,
        thread[$1]=0                     #mark the thread as terminated
    }

现在，其余代码取决于您的需求（需要考虑的事项），因此您将循环输入参数并调用spawn，然后在一段时间后循环遍历线程索引并调用terminate。或者，如果线程自己结束，循环输入参数并调用spawn和terminate，但终止的条件是：

[ ps -aux 2>/dev/null | grep " ${thread[$i]} " &>/dev/null ]
#look for thread id in process list (note spaces around id)

或者，就这一点而言，你明白了。

Answer 2

使用@theotherguy在评论中提供的提示，我使用GNU Parallel附带的sem命令以更好的方式重写了脚本：

#!/bin/bash

# set maximal number of parallel jobs
MAXPAR=5

# loop over the arguments
for ARG in 50 60 70 90
do
   # call to the actual program (here sleep for example)
   # prefixed by sem -j $MAXPAR
   #sem -j $MAXPAR   ./complicated_program $ARG
   sem -j $MAXPAR   sleep $ARG

   # give some output, so we know where we are
   echo ARG=$ARG
done

通过bash并行启动具有不同参数的相同程序

2 个答案: