如何解决URL重定向?

时间:2014-08-25 11:56:20

标签: bash redirect wget url-redirection resolveurl

我有一个包含许多短网址的txt文档。每个网址都由一行分隔。我想解析网址以获取最终链接。还有一些网址被重定向两次。如何自动执行此操作以获取最终网址每行输出格式为一个网址? 更新: 输入文本文件:

http://www.example.com/go/post-page-1 
http://www.example.com/go/post-page-2 
http://www.example.com/go/post-page-3 

txt文件中需要的输出格式:

http://www.example.org/post-page-name
http://www.example.org/post-page-name
http://www.example.org/post-page-name

以下是重定向链接的方式:

Initial URL:http://www.example.com/go/post-page 
    ==>301 Permanent Redirect

Intermediate url:http://click.affiliate.com/tracking?url=http://www.example.org/post-page-name
==>302 Temporary Redirect

Final URL: http://www.example.org/post-page-name

以下是我尝试的代码,但它不会将URL解析为最终链接,而是解析为中间链接。

#!/bin/bash
rm resolved_urls.txt
for url in $(cat url.txt); do
        wget -S "$url" 2>&1 | grep ^Location >> resolved_urls.txt
done

2 个答案:

答案 0 :(得分:0)

所以,它并不是100%明确你要求的东西。但是我所看到的以及我猜测的内容,我认为这样做会为你做到:

#! /bin/bash
# Use the urls.txt as your input file for wget
# Use the url-redirect.txt as your output file from wget.

wget -S -i urls.txt -o url-redirect.txt

# Grep for your "Final URL" output, extract the URL, assuming
#   the output you provided is what you're looking for, and is 
#   uniform, and redirect to your resolved_urls.txt file.

grep 'Final URL' url-redirect.txt | cut -d ' ' -f3>resolved_urls.txt

# Remove your trash temp file.
rm url-redirect.txt

如果没有所有重定向,这可能会快得多,但我认为这可以满足您的需求。

答案 1 :(得分:0)

尝试这样的事情:

#!/bin/bash

function getFinalRedirect {
    local url=$1
    while true; do
        nextloc=$( curl -s -I $url | grep ^Location: )
        if [ -n "$nextloc" ]; then
            url=${nextloc##Location: }
        else
            break
        fi
    done

    echo $url
}

url="http://stackoverflow.com/q/25485374/1563512"
getFinalRedirect $url

小心无限重定向。这会产生:

$ ./test.bash 
http://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects

然后,调用文件中的函数:

while read url; do
    getFinalRedirect $url
done < urls.txt > finalurls.txt