url使用域解析url格式

时间:2016-03-15 11:19:45

标签: php

我从其他网站获取了一些HTML代码。期运用

<?php
    $content =  file_get_contents('http://something.net/path/test.php');
?>

从这里我得到这个代码

<div class="main"><a href="/testother.php?abhijit=1">Test One</a></div>
<div class="main"><a href="top.php?kumar=1">Test One</a></div>
<div class="main"><a href="/testother.php?abhijit=3">Test One</a></div>
<div class="main"><a href="ww.php?kumar=1">Test One</a></div>

我使用正则表达式获取所有href属性。

/testother.php?abhijit=1

top.php?kumar=1

/testother.php?abhijit=3

ww.php?kumar=1

现在所有这些链接都没有域和路径

但是我希望像bellow一样得到所有这些链接(我希望所有链接都是格式链接)

http://something.net/testother.php?abhijit=1

http://something.net/path/top.php?kumar=1

http://something.net/test/other.php?abhijit=3

http://something.net/path/ww.php?kumar=1

如何格式化我的所有链接。

使用我的主网址和href attributs链接 php

(适用感谢)

1 个答案:

答案 0 :(得分:1)

使用 PHP 将相对路径转换为绝对 URL

function rel2abs($rel, $base) {
    /* return if already absolute URL */
    if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;

    /* queries */
    if ($rel[0] == '?') return explode("?", $base)[0] . $rel;

    /* anchors */
    if ($rel[0] == '#') return explode("#", $base)[0] . $rel;

    /* parse base URL and convert to local variables: $scheme, $host, $path */
    extract(parse_url($base));

    /* Url begins with // */
    if ($rel[0] == '/' && $rel[1] == '/') {
        return "$scheme:$rel";
    }

    /* remove non-directory element from path */
    $path = preg_replace('#/[^/]*$#', '', $path);

    /* destroy path if relative url points to root */
    if ($rel[0] == '/') $path = '';

    /* dirty absolute URL */
    $abs = "$host$path/$rel";

    /* replace '//' or '/./' or '/foo/../' with '/' */
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {}

    /* absolute URL is ready! */
    return "$scheme://$abs";
}

测试...

echo '<h4>Queries</h4>';
echo rel2abs("?query=1", "http://something.net/path/test.php");
echo '<br>';
echo rel2abs("?query=1", "http://something.net/path/test.php?old_query=1");

echo '<h4>Anchors</h4>';
echo rel2abs("#newAnchores", "http://something.net/path/test.php?a=1");
echo '<br>';
echo rel2abs("#newAnchores", "http://something.net/path/test.php?a=1#something");

echo '<h4>Path</h4>';
echo rel2abs("/testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./../../testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./../testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("testother.php", "http://something.net/folder1/folder2/folder3/test.php");

echo '<h4>Url begins with //</h4>';
echo rel2abs("//google.com/path/", "https://something.net/path/test.php");
echo '<br>';
echo rel2abs("//google.com/path/", "http://something.net/path/test.php");

测试输出...

Queries

http://something.net/path/test.php?query=1
http://something.net/path/test.php?query=1

Anchors

http://something.net/path/test.php?a=1#newAnchores
http://something.net/path/test.php?a=1#newAnchores

Path

http://something.net/testother.php
http://something.net/folder1/testother.php
http://something.net/folder1/folder2/testother.php
http://something.net/folder1/folder2/folder3/testother.php
http://something.net/folder1/folder2/folder3/testother.php

Url begins with //

https://google.com/path/
http://google.com/path/