使用分配启动子shell并等待

时间:2021-06-05 01:24:21

标签: bash shell

如何启动一些带有变量赋值的子shell并等待所有完成?

#!/bin/bash

#some code about $FILE="$1"

cat "$FILE" | while read -r HOST || [[ -n $HOST ]];
do
    echo "$HOST";
    URL="http://$HOST";  QUEST1=$(curl -Is --connect-timeout 200 --max-time 200 "$URL" | head -1);
    P1=$!
    URL="https://$HOST"; QUEST2=$(curl -Is --connect-timeout 200 --max-time 200 "$URL" | head -1);
    P2=$!
    
    echo "$P1 $P2"
    wait $P1 $P2
    R1=$( echo "$QUEST1" | grep -o " 200" );
    R2=$( echo "$QUEST2" | grep -o " 200" );
    echo "$R1 $R2"
    
    if [[ "$R1" || "$R2" ]]; then
    echo "FOUND!";
    fi

done

这不起作用。 echo "$P1 $P2" 是空的,因为我在 subshel​​l 中。我希望从当代开始,这样我就不必在第一次完成后等待。

好的,这是一个基本问题,但我想了解如何将其应用于其他情况。请我不要外部文件。

编辑 对于谁不明白。我想将 $QUEST1$QUEST2 放在后台以加快时间和等待,而不使用额外的文件。我读了这么多,但没有解决问题。谢谢

2 个答案:

答案 0 :(得分:3)

这是评论的简历:

将子shell 输出分配给变量/使用子shell 输出(STDOUT) 意味着父shell 将等待所有子shell 子进程结束,即使有一个内部backgrounded 命令。

举个例子:

x=$( { { /bin/sleep 10 ; echo out1; echo out2; } | head -1; } & ); \
echo "Wrong child PID : $!"

这将阻止父 shell 十秒钟。但是在这里你得到了父壳 $! ,而不是子壳中定义的那个。要获得预期的 $!,您必须以某种方式将其传输到您的父 shell(通过 STDOUT、STDERR 或文件,或命名管道等)。您可以通过 STDOUT 来实现,例如:

subpid=$( { { { /bin/sleep 10 ; echo out1; echo out2; } | head -1; } 1>&2 & } ; \
echo $!)

在这里,当您的子 shell 将其命令输出发送到 STDERR 并且仅在 STDOUT 上输出子 PID $! 时,该命令将几乎立即执行(然后父 shell 不会阻塞 I/O)。

因为您希望尽可能避免 I/O,并且如果您只需要子 shell $! 来等待子进程,您可以依赖父 shell 将等待的事实 来自子shell的所有STDOUT输出。那么你的实际命令就足够了,不需要知道子shell $! :

URL="http://$HOST";  QUEST1=$(curl -Is --connect-timeout 200 --max-time 200 "$URL" \
| head -1);

但是,如果您需要知道子 shell 的子 PID(请注意,此 PID 将是 此处shell pid,而不是 curlhead 命令)并等待子shell命令完成,然后您可以执行类似的操作以获得接近确定性的顺序< /strong>(如果您的子命令不包含至少一个管道,则不起作用):

x=$( { spid=$( { { { /bin/sleep 10;echo out1;echo out2; }|head -1;} 1>&2 & };echo $!);} \
2>&1 ; echo "SUBPID=$spid" )

这会在 x 后,十秒SUBPID=<subshell child pid> out1

此时,此 SUBPID 将不再存在(或不再是“您的”子 shell 子 pid),但您可以记录它或用它做任何您想做的事情。 >

您的命令将类似于:

URL="http://$HOST";  QUEST1=$( \
{ subpid=$( { { curl -Is --connect-timeout 200 --max-time 200 "$URL" | head -1; \
 } 1>&2 & } ;echo $!); } 2>&1 ; echo "SUBPID=$subpid" );

QUEST1 中的第一个条目应该是 SUBPID= 后跟 curl 第一行输出。

为了清楚地表明 shell 会等待,你可以在 google.com 里面用 10 秒 sleep 测试它:

URL="http://www.google.com";  QUEST1=$( { subpid=$( \
{ { { curl -Is --connect-timeout 200 --max-time 200 "$URL"; sleep 10; } | head -1; \
 } 1>&2 & };echo $!); } 2>&1 ; echo "SUBPID=$subpid" );

更新

在我们的交流之后,我知道您正在子 shell 中寻找异步 waitable 子进程,您需要在它完成时从中获取输出,所有这些无需使用临时文件也没有命名管道

有一种解决方案不需要临时文件,不需要磁盘写入 I/O,并且基于 @hhtamas solution to create an anonymous fifo 用于匿名管道而不是命名管道。

首先,这里是此解决方案的一个简单示例,接着是您的用例的实现(许多 curl 通过子shell调用)。

示例解决方案:

#!/bin/bash
# We use the bright solution from @htamas to create an anonymous pipe
# in the fds of our current shell.
# see: https://superuser.com/questions/184307/bash-create-anonymous-fifo
#
#
# 1. Creating the anonymous pipe
#

# start a background pipeline with two processes running forever
tail -f /dev/null | tail -f /dev/null &
# save the process ids
PID2=$!
PID1=$(jobs -p %+)
# hijack the pipe's file descriptors using procfs
exec 3>/proc/"${PID1}"/fd/1 4</proc/"${PID2}"/fd/0
# kill the background processes we no longer need
# (using disown suppresses the 'Terminated' message)
disown $PID2
kill "${PID1}" "${PID2}"
# anything we write to fd 3 can be read back from fd 4

#
# 2. Launching an "asynchonous subshell" and get its output
#

# We set a flag to trap the async subshell termination through SIGHUP
ready=0;
trap "ready=1" SIGHUP;

# We launch our subshell for the subprocess "sleep 10" with its output
# connected to the standalone anonymous pipe.
# As the sleep command as no output, we add "starting" and "finish".
# Note that as we send the output elsewhere than STDOUT, it's non blocking
# Note also that we send SIGHUP to our parent shell ($$) when the command finishs.
x=$( { echo "starting"; sleep 10; echo "finish"; echo "EOF"; kill -SIGHUP $$; } >&3 & )

# We now wait that our subshell terminates, it will terminate within the sleep command.
# Will waiting, we can do stuff. Here we just display "Waiting.." every seconds.
while [ "${ready}" = "0" ]; do
   echo "waiting for subshell..";
   sleep 1;
done;

# We close fd 3 early as we should no more output from the subshell
exec 3>&-

# We recover our subshell output from the out point of the autonomous pipe in y
line=""
y=$( while [ "${line}" != "EOF" ] ; do 
      read -r -u 4 line; 
      [ "${line}" != "EOF" ] && echo "${line}"; 
     done );

# And display the output of the subshell
echo "Subshell terminate, its output : ";
echo "${y}"

# close the file descriptors when we are finished (optional)
exec 4<&-

此解决方案需要 /proc 文件系统,这在许多实际的 UNIX 上很常见。解释在脚本中作为注释提供。

小改动:更好的子外壳识别过程,等待时更多的进程信息,处理子外壳的潜在崩溃。

为您的用例实现:

#!/bin/bash
#
# Create the anonymous pipe.
# 
# Parameters: None.
# Returns:
#   0 : Success.
#   1 : Failed to launch tails.
#   2 : Failed to exec.
#   3 : Failed to kill tails process.
function CreateAnonymousPipe() {
  # We use the bright solution from @htamas to create an anonymous pipe
  # in the fds of our current shell.
  # see: https://superuser.com/questions/184307/bash-create-anonymous-fifo
  #
  local pid1
  local pid2
  # start a background pipeline with two processes running forever
  tail -f /dev/null | tail -f /dev/null &
  [ $? != 0 ] && return 1;
  # save the process ids
  pid2=$!
  pid1=$(jobs -p %+)
  # hijack the pipe's file descriptors using procfs
  exec 3>/proc/"${pid1}"/fd/1 4</proc/"${pid2}"/fd/0
  [ $? != 0 ] && return 2;
  # kill the background processes we no longer need
  # (using disown suppresses the 'Terminated' message)
  disown "${pid2}"
  kill "${pid1}" "${pid2}"
  [ $? != 0 ] && return 3;

  # anything we write to fd 3 can be read back from fd 4
  return 0;
}
#
# Launch asynchronuously a curl process in a subshell.
# 
# Parameters: { URL } { indice }
#   URL : URL for the curl call.
#   indice : numeric identifier for this call
# Returns:
#   0 : Success.
#   1 : Missing parameters
#   2 : Failed to launch curl subprocess.
#   3 : Failed to access /proc
# STDOUT: PID of the corresponding subshell if success.
function CallCurl() {
  if [ $# != 2 ] ; then
    echo "CallCurl: URL and indice parameter are mandatory." 1>&2
    echo "          CallCurl { URL } { indice }." 1>&2
    return 1;
  fi
  [ ! -d /proc ] && return 3;
  local url="$1"
  local indice="$2"
  local subshell_PID
  # We launch our subshell for the subrprocess curl with its output
  # connected to the standalone anonymous pipe.
  # The curl process output is prefixed with its indice in the URL arrays.
  # Note that the subshell first renames itself with a specific identifier, 
  # curl_<indice>, and that we escape $BASHPID to use its pid for that :
  #   1) We can't use $$ to get the subshell PID as it is not a shell variable that
  #      can be evaluated at execution. As it is "immutable" from the shell point of
  #      view, it'll be always evaluated at first expansion, thus the parent shell PID.
  #   2) We don't rename after subshell launch using $! as its PID, at this time the
  #      subshell could have already terminated and its possible that another process
  #      have since been launched with this PID.
  # Note that we send its output elsewhere than STDOUT (to >&3), so it's non blocking.
  # Note also that we send USR1 signal to our parent shell ($$) when the command finishs.
  subshell_PID=$( { { local my_pid; 
                      eval my_pid="\${BASHPID}";
                      printf 'curl_%s' "${indice}">/proc/"${my_pid}"/comm 2>/dev/null;
                      curl -Is --connect-timeout 200 --max-time 200 "${url}" | head -1 |
                      { read -r line; echo "${indice}: ${line}"; };
                      kill -USR1 $$; 
                    } >&3 & 
                  } ; 
                  echo $!; )
  [ $? != 0 ] && return 2;
  echo "${subshell_PID}"
  return 0;
}
#
# Main URL processor, launch curl subprocess asynchronuously.
# 
# Parameters: { URL ... }
#   URL : URL to call with curl.
# Returns:
#   0 : Success.
#   1 : URL parameter(s) missing
#   2 : Failed to launch curl subprocess.
#   2 : Failed to create anonymous pipe.
# STDOUT: Processing and the outputs of the curl commands
function CurlProcessor() {
  if [ $# = 0 ] ; then
    echo "CurlProcessor: URL parameter is mandatory."  1>&2
    echo "               CurlProcessor { URL ... }." 1>&2
    return 1;
  fi
  local indice=0
  local isalive=0
  local -a URLarray
  # Feed the URL array
  while [ $# -gt 0 ] ; do URLarray+=("$1"); shift; done
  # Initialize a set of flags for each URL
  local -a ready
  for ((indice=0; indice < ${#URLarray[@]}; indice++)) ; do ready+=(0); done
  # Initialize an array of subshell PID for each URL to monitor
  local -a pid
  for ((indice=0; indice < ${#URLarray[@]}; indice++)) ; do pid+=(0); done
  # Initialize an array of subshell output for each URL 
  declare -a output
  for ((indice=0; indice < ${#URLarray[@]}; indice++)) ; do output+=(""); done
  # We create the anonymous pipe
  CreateAnonymousPipe
  [ $? != 0 ] && return 3;
  # Set a trap to catch USR1 and check which subshell are still alive through /proc
  # Local handler for the signals
  function trap_handler() {
    for indice in "${!pid[@]}" ; do
      if [ "${pid[${indice}]}" != "0" ] ; then 
        isalive="$(cat /proc/"${pid[${indice}]}"/comm 2>/dev/null)" 2>/dev/null; 
        [ "${isalive}" != "curl_${indice}" ] && ready[${indice}]=1;
      fi
    done
  }
  trap trap_handler USR1 2>/dev/null;
  # Now launch all the subshell
  for ((indice=0; indice < ${#URLarray[@]}; indice++)) ; do
    pid[${indice}]=$(CallCurl "${URLarray[${indice}]}" "${indice}"); 
    [ $? != 0 ] && return 2;
  done
  # We now wait that our subshells terminate.
  # Will waiting, we can do stuff. Here we just display "Waiting.." every seconds.
  local all_finished=0
  local num_finished=0
  local last_num_finished=0
  local direct_check_timer=0
  while [ "${all_finished}" = "0" ]; do
     # We check each URL subshell flag and loop till there is at least one unfinished.
     all_finished=1
     num_finished=0
     for ((indice=0; indice < ${#ready[@]}; indice++)) ; do 
       if [ ${ready[${indice}]} = 0 ] ; then
         all_finished=0; 
       else
         ((num_finished++));
       fi
     done
     echo "waiting for subshells.. ${num_finished}/${#ready[@]} finished.";
     sleep 1;
     # In case one or more subshell have crashed and thus wont send the USR1 signal, 
     # we launch here the handler to check the states of the subshells after 5sec
     # if there is no subshell termination in the interval.
     if [ "${all_finished}" = "0" ] ; then
       if [ "${last_num_finished}" = "${num_finished}" ] ; then
         ((direct_check_timer++))
         if [ "${direct_check_timer}" = "5" ] ; then
             echo "More than 5 seconds with no progress, doing a direct check."
             direct_check_timer=0 
             trap_handler
         fi
       else
         direct_check_timer=0 
       fi
     fi
     last_num_finished="${num_finished}"
  done;
  # All subshell have finished, we send EOF in the autonaumous pipe
  echo "EOF" >&3
  # We close fd 3 early 
  exec 3>&-
  # We recover our subshells outputs from the out point of the autonomous pipe
  local line=""
  local control=""
  while [ "${line}" != "EOF" ] ; do 
    read -r -u 4 line; 
    if [ "${line}" != "EOF" ] ; then
      # Each line should have "indice: " as a prefix to identify the URL associated
      indice="${line/: */}"
      if [ "${indice}" ] ; then
        control="${indice/[0-9]*/}"
        if [ "${control}" = "" ] ; then
          if [ "${output[${indice}]}" != "" ] ; then
            output[${indice}]="${output[${indice}]}\n${line/[0-9]*: /}"
          else
            output[${indice}]="${line/[0-9]*: /}"
          fi
        fi
      fi
    fi
  done
  # close the file descriptors when we are finished (optional)
  exec 4<&-
  # And display the output of the subshells
  echo "Subshells have all terminated, the output : ";
  for ((indice=0; indice < ${#URLarray[@]}; indice++)) ; do 
    echo "Output from URL ${URLarray[${indice}]} :"
    echo "${output[${indice}]}"
  done
  return 0;
}
#
# An example call of CurlProcessor
#
CurlProcessor "http://www.google.com" "http://stackoverflow.com/" "http://en.cppreference.com/"

通过示例调用,您将获得以下输出:

waiting for subshells.. 0/3 finished.
waiting for subshells.. 3/3 finished.
Subshells have all terminated, the output :
Output from URL http://www.google.com :
HTTP/1.1 200 OK
Output from URL http://stackoverflow.com/ :
HTTP/1.1 301 Moved Permanently
Output from URL http://en.cppreference.com/ :
HTTP/1.1 302 Found

fastly is down 时,你会得到:

waiting for subshells.. 0/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
More than 5 seconds with no progress, doing a direct check.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
More than 5 seconds with no progress, doing a direct check.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
waiting for subshells.. 2/3 finished.
More than 5 seconds with no progress, doing a direct check.
waiting for subshells.. 3/3 finished.
Subshells have all terminated, the output :
Output from URL http://www.google.com :
HTTP/1.1 200 OK
Output from URL http://stackoverflow.com/ :
HTTP/1.1 503 Backend unavailable, connection timeout
Output from URL http://en.cppreference.com/ :
HTTP/1.1 302 Found

(测试脚本的最佳时间^^。)

答案 1 :(得分:0)

使用@Zilog80 技术从我自己解决

====================================

编辑:

这个帖子全错了。

需要的外部文件。

====================================

附言看看他的回答就明白了(关于 STDOUT 1>&2 的重要部分,这样你就不会创建错误的孩子)

我知道在 subshel​​l ( ) 中,您可以分配从外部脚本中读出的变量。

用大括号对代码块进行分组,没有子shell。 curl 中的命令是匿名函数 { }。 (参考 https://tldp.org


所以这个时候你可以写

( export QUEST2=$(curl -Is --connect-timeout 2 --max-time 2 $URL | head -1) ) 2>&1 & P2=$!

我有子shell的PID和任务完成。

所有代码将是:


#!/bin/bash

FILE="$1"

cat "$FILE" | while read -r HOST || [[ -n $HOST ]];
do
    echo "$HOST";
    URL="http://$HOST";  ( export QUEST1=$(curl -Is --connect-timeout 2 --max-time 2 $URL | head -1) ) 2>&1 & P1=$!
    URL="https://$HOST"; ( export QUEST2=$(curl -Is --connect-timeout 2 --max-time 2 $URL | head -1) ) 2>&1 & P2=$!

    
    echo "$P1 $P2"
    wait $P1 $P2
    R1=$( echo "$QUEST1" | grep -o " 200" );
    R2=$( echo "$QUEST2" | grep -o " 200" );
    echo "$R1 $R2"
    
    if [[ "$R1" || "$R2" ]]; then
    echo "FOUND!";
    fi

done

其中 $FILEtextfile.txt,其中包含一个类似主机/IP 的列表

google.com
software.net
hacking.org
nasa.gov

现在您可以启动您的脚本来尝试站点是否启用了 http 或 https 协议

(USELIKE NO OTHER :() --> 了解如何分叉好

TNKS 到 @Zilog80