慢速运行脚本。我怎样才能提高它的速度?

时间:2015-05-04 00:49:43

标签: performance awk sed zsh

我怎样才能加快速度?它花了大约5分钟制作一个文件...... 它运行正常,但我有超过100000个文件。

我的awk或sed的实现是否会降低速度?我可以将其分解为几个较小的循环并在多个处理器上运行,但一个脚本更容易。

#!/bin/zsh
#1000 configs per file

alpha=( a b c d e f g h i j k l m n o p q r s t u v w x y z )
m=1000 # number of configs per file
t=1 #file number
for (( i=1; i<=4; i++ )); do
  for (( j=i; j<=26; j++ )); do
    input="arc"${alpha[$i]}${alpha[$j]}
    n=1 #line number
    #length=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
    #(( length= $length + 1 ))
length=644

for ((k=1; k<=$m; k++ )); do
    echo "$hmbi" >> ~/Glycine_Tinker/configs/config$t.in
    echo "jobtype = energy" >> ~/Glycine_Tinker/configs/config$t.in
    echo "analyze_only = false" >> ~/Glycine_Tinker/configs/config$t.in
    echo "qm_path = qm_$t" >> ~/Glycine_Tinker/configs/config$t.in
    echo "mm_path = aiff_$t" >> ~/Glycine_Tinker/configs/config$t.in
    cat head.in >> ~/Glycine_Tinker/configs/config$t.in
    water=4
    echo $k
  for (( l=1; l<=$length; l++ )); do
    natom=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
    number=`sed -n ${n}p $input| awk '{printf("%d",$6)}'`
    if [[ $natom -gt 10 && $number -gt 0 ]]; then
     symbol=`sed -n ${n}p $input| awk '{printf("%s",$2)}'`
     x=`sed -n ${n}p $input| awk '{printf("%.10f",$3)}'`
     y=`sed -n ${n}p $input| awk '{printf("%.10f",$4)}'`
     z=`sed -n ${n}p $input| awk '{printf("%.10f",$5)}'`

     if [[ $water -eq 4 ]]; then
     echo "--" >> ~/Glycine_Tinker/configs/config$t.in
     echo "0 1 0.4638" >> ~/Glycine_Tinker/configs/config$t.in
     water=1
     fi


     echo "$symbol  $x  $y  $z" >> ~/Glycine_Tinker/configs/config$t.in
     (( water= $water + 1 ))
    fi
    (( n= $n + 1 ))

  done
  cat tail.in >> ~/Glycine_Tinker/configs/config$t.in
  (( t= $t + 1 ))
 done

 done

done

2 个答案:

答案 0 :(得分:1)

在这里杀死你的一件事就是创造了大量的进程。特别是当他们做同样的事情时。

考虑每循环迭代执行一次sed -n ${n}p $input

还要考虑将awk等效于shell数组赋值,然后访问各个元素。

通过这两件事,您应该能够将12个左右的进程(以及通过引用引用的shell调用)下载到单个shell调用和反引用。

答案 1 :(得分:1)

显然,Ed的建议是可取的,但如果你不想这样做,我有几个想法......

思想1

而不是echo 5次和cat head.inGlycine file,每个都会导致文件被打开,寻找(或可能寻找)到最后,并附加,你可以这样做一次:

# Instead of 
hmbi=3
echo "$hmbi"            >> ~/Glycine_thing
echo "jobtype = energy" >> ~/Glycine_thing
echo "somethingelse"    >> ~/Glycine_thing
echo ...                >> ~/Glycine_thing          
echo ...                >> ~/Glycine_thing
cat  ...                >> ~/Glycine_thing

# Try this
{
  echo "$hmbi"
  echo "jobtype = energy"
  echo "somethingelse"
  echo
  echo
  cat head.in
} >> ~/Glycine_thing

# Or, better still, this
echo -e "$hmbi\njobtype = energy\nsomethingelse" >> Glycine_thing

# Or, use a here-document, as suggested by @mklement0
cat -<<EOF >>Glycine
$hmbi
jobtype = energy
next thing
EOF

思考2

不是调用sedawk 5次来查找5个参数,而是让awk执行sed正在执行的操作,并且同时执行所有5项操作:

read symbol x y z < <(awk '...{printf "%.10f %.10f %.10f" $2,$3,$4}' $input)