Question

我有一个包含许多短网址的txt文档。每个网址都由一行分隔。我想解析网址以获取最终链接。还有一些网址被重定向两次。如何自动执行此操作以获取最终网址每行输出格式为一个网址？更新：输入文本文件：

http://www.example.com/go/post-page-1 
http://www.example.com/go/post-page-2 
http://www.example.com/go/post-page-3

txt文件中需要的输出格式：

http://www.example.org/post-page-name
http://www.example.org/post-page-name
http://www.example.org/post-page-name

以下是重定向链接的方式：

Initial URL:http://www.example.com/go/post-page 
    ==>301 Permanent Redirect

Intermediate url:http://click.affiliate.com/tracking?url=http://www.example.org/post-page-name
==>302 Temporary Redirect

Final URL: http://www.example.org/post-page-name

以下是我尝试的代码，但它不会将URL解析为最终链接，而是解析为中间链接。

#!/bin/bash
rm resolved_urls.txt
for url in $(cat url.txt); do
        wget -S "$url" 2>&1 | grep ^Location >> resolved_urls.txt
done

Answer 1

所以，它并不是100％明确你要求的东西。但是我所看到的以及我猜测的内容，我认为这样做会为你做到：

#! /bin/bash
# Use the urls.txt as your input file for wget
# Use the url-redirect.txt as your output file from wget.

wget -S -i urls.txt -o url-redirect.txt

# Grep for your "Final URL" output, extract the URL, assuming
#   the output you provided is what you're looking for, and is 
#   uniform, and redirect to your resolved_urls.txt file.

grep 'Final URL' url-redirect.txt | cut -d ' ' -f3>resolved_urls.txt

# Remove your trash temp file.
rm url-redirect.txt

如果没有所有重定向，这可能会快得多，但我认为这可以满足您的需求。

Answer 2

尝试这样的事情：

#!/bin/bash

function getFinalRedirect {
    local url=$1
    while true; do
        nextloc=$( curl -s -I $url | grep ^Location: )
        if [ -n "$nextloc" ]; then
            url=${nextloc##Location: }
        else
            break
        fi
    done

    echo $url
}

url="http://stackoverflow.com/q/25485374/1563512"
getFinalRedirect $url

小心无限重定向。这会产生：

$ ./test.bash 
http://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects

然后，调用文件中的函数：

while read url; do
    getFinalRedirect $url
done < urls.txt > finalurls.txt

如何解决URL重定向？

2 个答案: