在正在运行的Bash脚本中修改变量

时间:2018-11-22 11:40:43

标签: bash gdb

我有一个处理几年数据的bash脚本,因此该脚本可能需要一周才能完成。 为了加快该过程,我使用多线程,方法是并行运行多个实例(每个实例= 1天的数据)。每个实例占用1个CPU,因此我可以运行尽可能多的实例。当我在与他人共享的功能强大的服务器中运行该过程时,有时我可能会使用更多或更少的CPU。 我当前的脚本是:

#!/bin/bash
function waitpid {
   #Gather the gLABs PID background processes (Maximum processes in 
   #background as number of CPUs)
   NUMPIDS=`jobs -p|awk 'END {print NR}'`
   #A while is set because there seems to be a bug in bash that makes 
   #sometimes the "wait -n" command
   #exit even if none of provided PIDs have finished. If this happens, 
   #the while loops forces the 
   #script to wait until one of the processes is truly finished
   while [ ${NUMPIDS} -ge ${NUMCPUS} ]
   do
     #Wait for gLAB processes to finish
     PIDS="`jobs -p|awk -v ORS=" " '{print}'`"
     wait -n ${PIDS} >/dev/null 2>/dev/null
     NUMPIDS=`jobs -p|awk 'END {print NR}'`
   done
}
NUMPCUS=10
for(...) #Loop for each day
do
   day=... #Set current day variable
   #Command to execute, put in background
   gLAB_linux -input ${day}folder/${day}.input -output ${day)outfolder/${day}.output &        
   #Wait for any process to finish if NUMCPUS number of processes are running in background
   waitpid 
done

因此,我的问题是:如果此脚本正在运行,有什么方法可以在不停止脚本的情况下将变量NUMCPUS更改为任何值(例如NUMCPUS = 23)? 如果可能,我希望使用一种不涉及读取或写入文件的方法(如果可能,我希望将临时文件减少为0)。 我不介意这是否是“骇人”的过程,例如this answer中描述的方法。实际上,我在gdb中尝试了与该答案类似的命令,但是它不起作用,我在gdb中遇到以下错误(并使程序崩溃):

(gdb) attach 23865
(gdb) call bind_variable("NUMCPUS",11,0)
'bind_variable' has unknown return type; cast the call to its declared return type
(gdb) call (int)bind_variable("NUMCPUS",11,0)
Program received signal SIGSEGV, Segmentation fault

EDIT1:对脚本的一些评论:

  • gLAB_linux是单个核心处理程序,并且不知道NUMCPUS变量
  • 每个gLAB_linux执行大约需要5个小时才能完成,因此bash脚本大部分时间都处于wait -n内部。
  • NUMCPUS必须是脚本的本地变量,因为可能有另一个这样的脚本并行运行(仅更改给gLAB_linux的参数)。因此NUMCPUS不能是环境变量。
  • 访问NUMCPUS的唯一进程是bash脚本

EDIT2:在@Kamil回答之后,我添加了从文件中读取CPU数量的建议

function waitpid {
   #Look if there is a file with new number of CPUs
   if [ -s "/tmp/numCPUs_$$.txt" ]
   then
     TMPVAR=$(awk '$1>0 {print "%d",$1} {exit}' "/tmp/numCPUs_$$.txt")
     if [ -n "${TMPVAR}" ]
     then
       NUMCPUS=${TMPVAR}
       echo "NUMCPUS=${TMPVAR}"
     fi
     rm -f "/tmp/numCPUs_$$.txt"
   fi

   #Gather the gLABs PID background processes (Maximum processes in 
   #background as number of CPUs)
   NUMPIDS=`jobs -p|awk 'END {print NR}'`
   #A while is set because there seems to be a bug in bash that makes 
   #sometimes the "wait -n" command
   #exit even if none of provided PIDs have finished. If this happens, 
   #the while loops forces the 
   #script to wait until one of the processes is truly finished
   while [ ${NUMPIDS} -ge ${NUMCPUS} ]
   do
     #Wait for gLAB processes to finish
     PIDS="`jobs -p|awk -v ORS=" " '{print}'`"
     wait -n ${PIDS} >/dev/null 2>/dev/null
     NUMPIDS=`jobs -p|awk 'END {print NR}'`
   done
}

2 个答案:

答案 0 :(得分:2)

最好是修改bash脚本,以便它知道您更改了该值。从gdb会话内部修改环境变量-这只是侵入性的,并且几乎会丢弃其他开发人员的工作。

下面,我使用一个名为/tmp/signal_num_cpus的文件。如果文件不存在,脚本将使用NUMCPUS值。如果该文件确实存在,它将读取其内容并相应地更新NUMCPUS的数量,然后输出一些有关将numcpus更改为文件的通知。如果该文件确实存在并且不包含有效数字(例如,在预定义范围或smth中),则会在该文件中打印一些错误消息。通知对方一切都很好或发生了什么坏事

#!/bin/bash

is_not_number() { 
    (( $1 != $1 )) 2>/dev/null
}

# global variable to hold the number of cpus with a default value
NUMCPUS=${NUMCPUS:=5}
# this will ideally execute on each access to NUMCPUS variable
# depending on content
get_num_cpus() { 
   # I tell others that NUMCPUS is a global variable and i expect it here
   declare -g NUMCPUS
   # I will use this filename to communicate
   declare -r file="/tmp/signal_num_cpus"
   # If the file exists and is a fifo...
   if [ -p "$file" ]; then
       local tmp
       # get file contents
       tmp=$(<"$file")
       if [ -z "$tmp" ]; then
           #empty is ignored
           :;
       elif is_not_number "$tmp"; then
           echo "Error reading a number from $file" >&2
           echo "error: not a number, please give me a number!" > "$file"
       else
           # If it is ok, update the NUMCPUS value
           NUMCPUS=$tmp
           echo "ok $NUMCPUS" > "$file"  # this will block until other side starts reading
       fi
   fi
   # last but not least, let's output it
   echo "$NUMCPUS"
}

# code duplication is the worst (ok, sometimes except for databases frameworks)
get_num_bg_jobs() {
    jobs -p | wc -l
}

waitpid() {
   while 
         (( $(get_num_bg_jobs) >= $(get_num_cpus) ))
   do
         wait -n
   done
}

# rest of the script

NUMPCUS=10
for(...) #Loop for each day
do
   day=... #Set current day variable
   #Command to execute, put in background
   gLAB_linux -input "${day}folder/${day}.input" -output "${day)outfolder/${day}.output" &        
   #Wait for any process to finish if NUMCPUS number of processes are running in background
   waitpid 
done

更改值脚本可能如下所示:

#!/bin/bash

# shared context between scripts
declare -r file="/tmp/signal_num_cpus"

mkfifo "$file"

echo 1 > "$file" # this will block until other side will start reading

IFS= read -r line < "$file"

case "$line" in
ok*) 
     read _ numcpus <<<"$line"
     echo "the script changed the number of numcpus to $numcpus"
     ;;
*)
     echo "the script errored with $error"
     ;;
esac

rm "$file"

标记:

  • 定义函数的正确方法是func() { :; }使用function func { }是ksh的功能,并且作为扩展支持。使用func() {}
  • 使用算术扩展(( ... ))进行数字比较和处理比较好。
  • 不推荐将反引号`用于命令替换$( ... )

答案 1 :(得分:1)

GNU Parallel 2018的第7.1章介绍了如何更改运行https://zenodo.org/record/1146014时使用的线程数

Library/MobileDevice/Provisioning Profiles

因此,您只需将echo 50% > my_jobs /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 & sleep 1 echo 0 > my_jobs wait 的参数放入--jobs中,GNU Parallel将在完成每个作业后读取该参数。