Question

我正在尝试编写一个使用cURL下载文件的shell脚本，并检测网址是否会导致404错误。如果网址是404，那么我想将网址链接或文件名保存到文本文件中。

网址格式为http://server.com/somefile[00-31].txt

我一直在搞乱谷歌上发现的东西，目前有以下代码：

#!/bin/bash

if curl --fail -O "http://server.com/somefile[00-31].mp4"
then
  echo "$var" >> "files.txt"    
else
  echo "Saved!"
fi

Answer 1

#!/bin/bash

URLFORMAT="http://server.com/somefile%02d.txt"

for num in {0..31}; do
    # build url and try to download
    url=$(printf "$URLFORMAT" $num)
    CURL=$(curl --fail -O "$url" 2>&1)

    # check if error and 404, and save in file
    if [ $? -ne 0 ]; then
        echo $CURL | grep --quiet 'The requested URL returned error: 404'
        [ $? -eq 0 ] && echo "$url" >> "files.txt"
    fi

done

这是一个使用URL参数的版本：

#!/bin/sh

for url in "$@"; do
    CURL=$(curl --fail -O "$url" 2>&1)
    # check if error and 404, and save in file
    if [ $? -ne 0 ]; then
        echo $CURL | grep --quiet 'The requested URL returned error: 404'
        [ $? -eq 0 ] && echo "$url" >> "files.txt"
    fi
done

您可以使用以下内容：sh download-script.sh http://server.com/files {00..21} .png http://server.com/otherfiles {00..12} .gif

范围扩展将适用于bash shell。

Answer 2

您可以使用# Move compile outside the loop; the whole point of compiling is to do the work once # and reuse the compiled object over and over reg = re.compile('href="(.*?)"|href=\'(.*?)\'') for num, line in enumerate(file, 1): if check in line: print 'href at line', num for link in reg.finditer(line): print 'url:', link.group(1)选项捕获所有标头，包括Content-Type和HTTP Status Code。请注意，标题可能以DOS换行符（CR LF）结束，因此您可能想要删除CR字符。

-D, --dump-header <file>

使用404检测进行多个cURL下载

2 个答案: