检查从同一bash脚本启动的后台进程的运行状态

时间:2015-11-05 15:11:16

标签: linux bash shell

我必须编写一个bash脚本,根据传递的命令行参数在后台中启动一个进程,如果成功能够运行,则返回该计划。

这是我想要实现的伪代码

if [ "$1" = "PROG_1" ] ; then
    ./launchProg1 &
    if [ isLaunchSuccess ] ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
elif [ "$1" = "PROG_2" ] ; then
    ./launchProg2 &
    if [ isLaunchSuccess ] ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
fi

脚本不能waitsleep,因为它将由另一个关键任务c ++程序调用,并且需要高吞吐量(每秒启动没有进程),而且进程的运行时间未知。脚本既不需要捕获任何输入/输出,也不需要等待已启动的进程完成。

我未能成功尝试以下内容:

#Method 1
if [ "$1" = "KP1" ] ; then
    echo "The Arguement is KP1"
    ./kp 'this is text' &
    if [ $? = "0" ] ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
elif [ "$1" = "KP2" ] ; then
    echo "The Arguement is KP2"
    ./NoSuchCommand 'this is text' &
    if [ $? = "0" ] ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
#Method 2
elif [ "$1" = "CD5" ] ; then
    echo "The Arguement is CD5"
    cd "doesNotExist" &
    PROC_ID=$!
    echo "PID is $PROC_ID"
    if kill -0 "$PROC_ID" ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
#Method 3
elif [ "$1" = "CD6" ] ; then
    echo "The Arguement is CD6"
    cd .. &
    PROC_ID=$!
    echo "PID is $PROC_ID"
    ps -eo pid | grep "$PROC_ID" && { echo "Success"; exit 0; }
    ps -eo pid | grep  "$PROC_ID" || { echo "failed" ; exit 1; }
else
    echo "Unknown Argument"
    exit 1
fi

运行脚本会产生不可靠的输出。方法1,2总是返回Success,而方法3在检查之前完成流程执行时返回failed

以下是GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)

上的示例测试
[scripts]$ ./processStarted3.sh KP1
The Arguement is KP1
Success
[scripts]$ ./processStarted3.sh KP2
The Arguement is KP2
Success
./processStarted3.sh: line 13: ./NoSuchCommand: No such file or directory
[scripts]$ ./processStarted3.sh CD6
The Arguement is CD6
PID is 25050
failed

正如类似问题所示,我无法使用流程名称one process may be executed several times,并且无法应用others

我没有尝试screentmux,因为获得在生产服务器上安装它们的权限并不容易(但如果这是剩下的唯一选项,则会这样做)

更新
@ghoti
./kp是存在的程序,启动程序返回Success./NoSuchCommand不存在。从(已编辑)输出中可以看出,脚本错误地返回Success

进程完成执行或程序异常终止时无关紧要。通过脚本启动的程序不会以任何方式进行跟踪(因此我们不会在任何表格中存储pid,也不会出现使用deamontools)的必要条件。

@Etan Reisner
未能启动的程序示例将为./NoSuchCommand,但不存在。或者可能是一个无法启动的损坏的程序。

@Vorsprung
调用在后台启动程序的脚本不需要很多时间(并且可以按照我们的期望进行管理)。但是sleep 1会随着时间的推移而累积,从而导致问题。

前面提到的#Method3可以很好地禁止在ps -eo pid | grep "$PROC_ID" && { echo "Success"; exit 0; }检查之前终止的限制流程。

4 个答案:

答案 0 :(得分:3)

这是一个示例,它将显示一个过程的结果是否成功启动。

#!/bin/bash
$1 & #executes a program in background which is provided as an argument
pid=$! #stores executed process id in pid
count=$(ps -A| grep $pid |wc -l) #check whether process is still running
if [[ $count -eq 0 ]] #if process is already terminated, then there can be two cases, the process executed and stop successfully or it is terminated abnormally
then
        if wait $pid; then #checks if process executed successfully or not
                echo "success"
        else                    #process terminated abnormally
                echo "failed (returned $?)"
        fi
else
        echo "success"  #process is still running
fi

#Note: The above script will only provide a result whether process started successfully or not. If porcess starts successfully and later it terminates abnormally then this sciptwill not provide a correct result

答案 1 :(得分:2)

接受的答案并不像宣传的那样有效。

此检查中的计数始终至少为1,因为" grep $ pid"将找到带有$ pid的进程(如果存在)和grep。

count=$(ps -A| grep $pid |wc -l)
if [[ $count -eq 0 ]]
then
    ### We can never get here
else
    echo "success"  #process is still running
fi

更改上述内容以检查计数为1或从计数中排除grep应该使原始工作。

这是原始示例的替代(可能更简单)实现。

#!/bin/bash
$1 & # executes a program in background which is provided as an argument
pid=$! # stores executed process id in pid

# check whether process is still running
# The "[^[]" excludes the grep from finding itself in the ps output
if ps | grep "$pid[^[]" >/dev/null
then
    echo "success (running)"  # process is still running
else
    # If the process is already terminated, then there are 2 cases:
    # 1) the process executed and stop successfully
    # 2) it is terminated abnormally

    if wait $pid # check if process executed successfully or not
    then
        echo "success (ran)"
    else
        echo "failed (returned $?)" # process terminated abnormally
    fi
fi

# Note: The above script will detect if a process started successfully or not. If process is running when we check, but later it terminates abnormally then this script will not detect this.

答案 2 :(得分:0)

抱歉错过了这个要求“脚本不能等待或睡觉”

启动后台程序,获取它的pid。等一等。然后用kill -0

检查它是否仍在运行

kill -0状态取自$?这用于确定进程是否仍在运行

#!/bin/bash

./$1 &
pid=$!

sleep 1;

kill -0 $pid
stat=$?
if [ $stat -eq 0 ] ; then
  echo "running as $!"
  exit 0
else
  echo "$! did not start"
  exit 1
fi

也许如果你的超级C ++程序不能等待一秒钟,它也不能期望能够以每秒高的速率启动一堆shell命令?

也许你需要在这里实现一个队列?

对不起,问题多于答案

答案 3 :(得分:0)

使用jobs

将以下内容放在bash脚本中并执行

#!/bin/bash

{ sleep 1 ; echo sleep1 ; } &
sleep 0
jobs
wait

echo nosleep &
sleep 0
jobs
wait

echo exit1
false &
sleep 0
jobs
wait

notexisting &
sleep 0
jobs
wait

./existingbutnotexecutable &
sleep 0
jobs
wait

输出

$ ./testrun.sh 
[1]+  Running                 { sleep 1; echo sleep1; } &
sleep1
nosleep
[1]+  Done                    echo nosleep
exit1
[1]+  Exit 1                  false
./testrun.sh: line 19: notexisting: command not found
[1]+  Exit 127                notexisting
./testrun.sh: line 24: ./existingbutnotexecutable: Permission denied
[1]+  Exit 126                ./existingbutnotexecutable

来自jobs的输出我们之间可能有所区别:

  • 仍在运行的后台作业
  • 已完成的作业
  • 使用非零exitstatus运行的作业
  • 由于找不到命令而无法运行的作业
  • 以及由于不可执行而无法运行的作业。

也许还有更多的案例,但我没有研究更多。

wait只是为了确保一次只能有一个后台作业。

sleep 0是必要的,否则jobs即使在shell能够报告未找到错误命令之前,报告进程仍在运行。我试过echo,但似乎还不够延迟。

删除sleep并获得此输出

$ ./testrun.sh 
[1]+  Running                 { sleep 1; echo sleep1; } &
sleep1
[1]+  Running                 echo nosleep &
nosleep
exit1
[1]+  Running                 false &
[1]+  Running                 notexisting &
./testrun.sh: line 19: notexisting: command not found
[1]+  Running                 ./existingbutnotexecutable &
./testrun.sh: line 24: ./existingbutnotexecutable: Permission denied

注意jobs总是说“正在运行”,并且总是在命令结果之前。错误与否。

这是根据jobs

的输出采取行动的一种可能性
#!/bin/bash

isrunsuccess() {
  case $(jobs) in
    *Running*)   echo ">>> running" ;;
    *Done*)      echo ">>> done" ;;
    *Exit\ 127*) echo ">>> not found" ;;
    *Exit\ 126*) echo ">>> not executable" ;;
    *Exit*)      echo ">>> done nonzero exitstatus" ;;
  esac
}

{ sleep 1 ; echo sleep1 ; } &
sleep 0
isrunsuccess
wait

echo nosleep &
sleep 0
isrunsuccess
wait

echo exit1
false &
sleep 0
isrunsuccess
wait

notexisting &
sleep 0
isrunsuccess
wait

./existingbutnotexecutable &
sleep 0
isrunsuccess
wait

输出

$ ./testrun.sh 
>>> running
sleep1
nosleep
>>> done
exit1
>>> done nonzero exitstatus
./testrun.sh: line 29: notexisting: command not found
>>> not found
./testrun.sh: line 34: ./existingbutnotexecutable: Permission denied
>>> not executable

您可以合并“已运行”和“未运行”案例

isrunsuccess() {
  case $(jobs) in
    *Exit\ 127*|*Exit\ 126*) echo ">>> did not run" ;;
    *Running*|*Done*|*Exit*) echo ">>> still running or was running" ;;
  esac
}

输出

$ ./testrun.sh 
>>> still running or was running
sleep1
nosleep
>>> still running or was running
exit1
>>> still running or was running
./testrun.sh: line 26: notexisting: command not found
>>> did not run
./testrun.sh: line 31: ./existingbutnotexecutable: Permission denied
>>> did not run

检查bash中字符串内容的其他方法:How do you tell if a string contains another string in Unix shell scripting?

bash的文档说明exitstatus 127未找到而126为不可执行文件:https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html