Question

我有一个驱动程序脚本，它管理一个作业字符串，它可以根据依赖图并行或顺序运行作业。例如：

Job              Predecessors

A                null
B                A
C                A
D                B
E                D, C
F                E

驱动程序在后台启动A，并等待使用bash内置suspend暂停自身来完成它。完成后，作业A向驱动程序发送SIGCONT，然后在后台启动B和C并再次暂停，依此类推。

驱动程序有set -m，因此启用了作业控制。

当驱动程序本身在后台启动时，这可以正常工作。但是，当在前台调用驱动程序时，第一次调用暂停工作正常。 第二次调用似乎变为“exit”，报告“There are stopped jobs”并且不退出。要暂停的第三次调用也会变成'exit'并且会杀死驱动程序和所有孩子 [因为它应该考虑这是第二次转换为' exit“。

这是我的问题：这是预期的行为吗？如果是，为什么和我该如何解决？

感谢。

以下代码片段：

驱动：

            for step in $(hash_keys 'RUNNING_HASH')
            do
                    proc=$(hash_find 'RUNNING_HASH' $step)
                    if [ $proc ]
                    then
                            # added the grep to ensure the process is found
                            ps -p $proc | grep $proc > /dev/null 2>&1
                            if [ $? -eq 0 ]
                            then
                                    log_msg_to_stderr $SEV_DEBUG "proc $proc running: suspending execution"
                                    suspend 
                                    # execution resumes here on receipt of SIGCONT
                                    log_msg_to_stderr $SEV_DEBUG "signal received: continuing execution"
                                    break
                            fi
                    fi
            done

作业：

## $$ is the driver's PID
kill -SIGCONT $$

Answer 1

工作人员工作完成后会退出吗？如果是这样，而不是使用suspend和SIGCONT，那么如何在驱动程序脚本中简单地使用wait $PIDS？

Answer 2

我不得不认为你在玩作业控制和暂停等方面过于复杂。这是一个示例程序，可以让5个孩子一直在运行。每隔一秒它会看到是否有人离开（比ps | grep，BTW更有效率）并在必要时启动一个新孩子。

#!/usr/bin/bash

set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD

totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist

dojob()
{
  slot=$1
  time=$(echo "$RANDOM * 10 / 32768" | bc -l)
  echo Starting job $slot with args $time
  sleep $time &
  pidlist[$slot]=`jobs -p %%`
  curjobs=$(($curjobs + 1))
  totaljobs=$(($totaljobs - 1))
}

# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
 do
  dojob $curjobs
 done

# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
 do
  for ((i=0;$i < $curjobs;i++))
   do
    if ! kill -0 ${pidlist[$i]} >&/dev/null
     then
      dojob $i
      break
     fi
   done
   sleep 10.9 >&/dev/null
 done
wait

暂停和恢复作业的问题

2 个答案: