Question

做了一个有趣的观察 - 我将cURL语句的输出存储在一个文本文件中，然后为某些字符串进行grep。后来我改变了我的代码，将输出存储到变量中。事实证明，这种变化导致我的脚本运行得慢得多。这对我来说真的很直观，因为我一直认为I / O操作比内存操作更昂贵。这是代码：

#!/bin/bash
URL="http://m.cnbc.com"
while read line; do
  UA=$line
  curl -s --location --user-agent "$UA" $URL > RAW.txt
  #RAW=`curl --location --user-agent "$UA" $URL`
  L=`grep -c -e "Advertise With Us" RAW.txt`
  #L=`echo $RAW | grep -c -e "Advertise With Us"`
  M=`grep -c -e "id='menu'><button>Menu</button>" RAW.txt`
  #M=`echo $RAW | grep -c -e "id='menu'><button>Menu</button>"`
  D=`grep -c -e "Careers" RAW.txt`
  #D=`echo $RAW | grep -c -e "Careers"`
  if [[ ( $L == 1 && $M == 0 ) && ( $D == 0) ]]
    then
      AC="Legacy"
  elif [[ ( $L == 0 && $M == 1 ) && ( $D == 0) ]]
    then
  AC="Modern"
  elif [[ ( $L == 0 && $M == 0 ) && ( $D == 1) ]]
    then
      AC="Desktop"
  else
  AC="Unable to Determine"
  fi
  echo $AC >> Results.txt
done < UserAgents.txt

注释行代表存储变量方法。任何想法为什么会这样？还有什么办法可以进一步加快这个脚本的速度吗？现在，处理2000个输入条目大约需要8分钟。

Answer 1

切普纳是对的。只读一次对cURL的每次调用，标记三个所需字符串中的每一个。这是使用awk的一些示例代码。完全未经测试：

URL="http://m.cnbc.com"
while IFS= read -r line; do
    RAW=$(curl --location --user-agent "$line" $URL)

    awk '
    /Advertise With Us/ {
        L=1
    }
    /id='\''menu'\''><button>Menu<\/button>/ {
        M=1
    }
    /Careers/ {
        D=1
    }

    END {
        if (L==1 && M==0 && D==0) {
            s = "Legacy"
        }
        else if (L==0 && M==1 && D==0) {
            s = "Modern"
        }
        else if (L==0 && M==0 && D==1) {
            s = "Desktop"
        }
        else {
            s = "Unable to Determine"
        }

        print s >> "Results.txt"
    }' "$RAW"

done < UserAgents.txt

Answer 2

您真的需要计算grep -c的匹配数吗？看起来您只需要知道是否找到了匹配项。如果是这样，你可以简单地使用bash的内置字符串比较。

此外，如果您在循环外写入结果文件，它会更快。

尝试以下方法：

#!/bin/bash
URL="http://m.cnbc.com"
while read line
do
  UA="$line"
  RAW=$(curl -s --location --user-agent "$UA" "$URL")
  [[ $RAW == *"Advertise With Us"* ]] && L=1 || L=0
  [[ $RAW == *"id='menu'><button>Menu</button>"* ]] && M=1 || M=0
  [[ $RAW == *Careers* ]] && D=1 || D=0

  if (( L==1 && M==0 && D==0 ))
  then
     AC="Legacy"
  elif (( L==1 && M==1 && D==0 ))
  then
     AC="Modern"
  elif (( L==1 && M==0 && D==1 ))
  then
     AC="Desktop"
  else
     AC="Unable to Determine"
  fi
  echo "$AC" 
done < UserAgents.txt > Results.txt

grep-an变量与文件 - 执行时间

2 个答案: