我有一个包含许多短网址的txt文档。每个网址都由一行分隔。我想解析网址以获取最终链接。还有一些网址被重定向两次。如何自动执行此操作以获取最终网址每行输出格式为一个网址? 更新: 输入文本文件:
http://www.example.com/go/post-page-1
http://www.example.com/go/post-page-2
http://www.example.com/go/post-page-3
txt文件中需要的输出格式:
http://www.example.org/post-page-name
http://www.example.org/post-page-name
http://www.example.org/post-page-name
以下是重定向链接的方式:
Initial URL:http://www.example.com/go/post-page
==>301 Permanent Redirect
Intermediate url:http://click.affiliate.com/tracking?url=http://www.example.org/post-page-name
==>302 Temporary Redirect
Final URL: http://www.example.org/post-page-name
以下是我尝试的代码,但它不会将URL解析为最终链接,而是解析为中间链接。
#!/bin/bash
rm resolved_urls.txt
for url in $(cat url.txt); do
wget -S "$url" 2>&1 | grep ^Location >> resolved_urls.txt
done
答案 0 :(得分:0)
所以,它并不是100%明确你要求的东西。但是我所看到的以及我猜测的内容,我认为这样做会为你做到:
#! /bin/bash
# Use the urls.txt as your input file for wget
# Use the url-redirect.txt as your output file from wget.
wget -S -i urls.txt -o url-redirect.txt
# Grep for your "Final URL" output, extract the URL, assuming
# the output you provided is what you're looking for, and is
# uniform, and redirect to your resolved_urls.txt file.
grep 'Final URL' url-redirect.txt | cut -d ' ' -f3>resolved_urls.txt
# Remove your trash temp file.
rm url-redirect.txt
如果没有所有重定向,这可能会快得多,但我认为这可以满足您的需求。
答案 1 :(得分:0)
尝试这样的事情:
#!/bin/bash
function getFinalRedirect {
local url=$1
while true; do
nextloc=$( curl -s -I $url | grep ^Location: )
if [ -n "$nextloc" ]; then
url=${nextloc##Location: }
else
break
fi
done
echo $url
}
url="http://stackoverflow.com/q/25485374/1563512"
getFinalRedirect $url
小心无限重定向。这会产生:
$ ./test.bash
http://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects
然后,调用文件中的函数:
while read url; do
getFinalRedirect $url
done < urls.txt > finalurls.txt