我正在尝试编写一个使用cURL下载文件的shell脚本,并检测网址是否会导致404错误。如果网址是404,那么我想将网址链接或文件名保存到文本文件中。
网址格式为http://server.com/somefile[00-31].txt
我一直在搞乱谷歌上发现的东西,目前有以下代码:
#!/bin/bash
if curl --fail -O "http://server.com/somefile[00-31].mp4"
then
echo "$var" >> "files.txt"
else
echo "Saved!"
fi
答案 0 :(得分:4)
#!/bin/bash
URLFORMAT="http://server.com/somefile%02d.txt"
for num in {0..31}; do
# build url and try to download
url=$(printf "$URLFORMAT" $num)
CURL=$(curl --fail -O "$url" 2>&1)
# check if error and 404, and save in file
if [ $? -ne 0 ]; then
echo $CURL | grep --quiet 'The requested URL returned error: 404'
[ $? -eq 0 ] && echo "$url" >> "files.txt"
fi
done
这是一个使用URL参数的版本:
#!/bin/sh
for url in "$@"; do
CURL=$(curl --fail -O "$url" 2>&1)
# check if error and 404, and save in file
if [ $? -ne 0 ]; then
echo $CURL | grep --quiet 'The requested URL returned error: 404'
[ $? -eq 0 ] && echo "$url" >> "files.txt"
fi
done
您可以使用以下内容:sh download-script.sh http://server.com/files {00..21} .png http://server.com/otherfiles {00..12} .gif
范围扩展将适用于bash shell。
答案 1 :(得分:1)
您可以使用# Move compile outside the loop; the whole point of compiling is to do the work once
# and reuse the compiled object over and over
reg = re.compile('href="(.*?)"|href=\'(.*?)\'')
for num, line in enumerate(file, 1):
if check in line:
print 'href at line', num
for link in reg.finditer(line):
print 'url:', link.group(1)
选项捕获所有标头,包括Content-Type和HTTP Status Code。请注意,标题可能以DOS换行符(CR LF)结束,因此您可能想要删除CR字符。
-D, --dump-header <file>