Question

这是一个用于HTTP状态代码的简单bash脚本

 while read url
    do
        urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
        echo "$url  $urlstatus" >> urlstatus.txt
    done < $1

我正在从文本文件中读取URL，但它一次只处理一个，花费太多时间，GNU parallel和xargs也处理一行（测试）

如何处理同步网址以进行处理以改善时间安排？换句话说，URL文件的线程而不是bash命令（GNU并行和xargs做）

Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this

。

Answer 1

你提到你没有与GNU parallel好运。也许这样试试？

format='curl -o /dev/null --silent --head --write-out "%{http_code}" "%s"; echo "%s"\n'

awk -v fs="$format" '{printf fs, $0, $0}' url-list.txt | parallel

想要，例如128个同步过程？

awk -v fs="$format" '{printf fs, $0, $0}' url-list.txt | parallel -P128

Answer 2

    #!/bin/bash
while read LINE; do
  curl -o /dev/null --silent --head --write-out '%{http_code}' "$LINE" & echo
  echo " $LINE"
done < url-list.txt

你正在逐行读取一个文件并将该行传递给curl，这是获取东西然后当CURL完成它时将读取新行。所以要避免你需要添加＆amp;回声

一个令人讨厌的例子：

file="/tmp/url-list.txt"
echo "hello 1" >>$file 
echo "hello 2" >>$file
echo "hello3" >>$file 
while read line ;do 
  sleep 3 && echo "i run after sleep 3 - $line"  & echo "i runn as the same time of sleep 3"
done< "$file"

Answer 3

GNU parallel和xargs也处理一行（测试）

你能举个例子吗？如果您使用-j，那么您应该能够一次运行多个进程。

我会这样写：

doit() {
    url="$1"
    urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
    echo "$url  $urlstatus"
}
export -f doit
cat "$1" | parallel -j0 -k doit >> urlstatus.txt

根据输入：

Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this
www.google.com
pi.dk

我得到了输出：

Input file is txt file and lines are separated  as  000
ABC.Com  301
Bcd.Com  301
Any.Google.Com  000
Something  like this  000
www.google.com  302
pi.dk  200

看起来是对的：

000 if domain does not exist
301/302 for redirection
200 for success

从bash中的txt文件中多次读取（线程）

3 个答案: