我从其他网站获取了一些HTML代码。期运用
<?php
$content = file_get_contents('http://something.net/path/test.php');
?>
从这里我得到这个代码
<div class="main"><a href="/testother.php?abhijit=1">Test One</a></div>
<div class="main"><a href="top.php?kumar=1">Test One</a></div>
<div class="main"><a href="/testother.php?abhijit=3">Test One</a></div>
<div class="main"><a href="ww.php?kumar=1">Test One</a></div>
我使用正则表达式获取所有href属性。
/testother.php?abhijit=1
top.php?kumar=1
/testother.php?abhijit=3
ww.php?kumar=1
现在所有这些链接都没有域和路径
但是我希望像bellow一样得到所有这些链接(我希望所有链接都是格式链接)
http://something.net/testother.php?abhijit=1
http://something.net/path/top.php?kumar=1
http://something.net/test/other.php?abhijit=3
http://something.net/path/ww.php?kumar=1
如何格式化我的所有链接。
使用我的主网址和href attributs链接 php
(适用感谢)
答案 0 :(得分:1)
使用 PHP 将相对路径转换为绝对 URL
function rel2abs($rel, $base) {
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
/* queries */
if ($rel[0] == '?') return explode("?", $base)[0] . $rel;
/* anchors */
if ($rel[0] == '#') return explode("#", $base)[0] . $rel;
/* parse base URL and convert to local variables: $scheme, $host, $path */
extract(parse_url($base));
/* Url begins with // */
if ($rel[0] == '/' && $rel[1] == '/') {
return "$scheme:$rel";
}
/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);
/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';
/* dirty absolute URL */
$abs = "$host$path/$rel";
/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {}
/* absolute URL is ready! */
return "$scheme://$abs";
}
测试...
echo '<h4>Queries</h4>';
echo rel2abs("?query=1", "http://something.net/path/test.php");
echo '<br>';
echo rel2abs("?query=1", "http://something.net/path/test.php?old_query=1");
echo '<h4>Anchors</h4>';
echo rel2abs("#newAnchores", "http://something.net/path/test.php?a=1");
echo '<br>';
echo rel2abs("#newAnchores", "http://something.net/path/test.php?a=1#something");
echo '<h4>Path</h4>';
echo rel2abs("/testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./../../testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./../testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<h4>Url begins with //</h4>';
echo rel2abs("//google.com/path/", "https://something.net/path/test.php");
echo '<br>';
echo rel2abs("//google.com/path/", "http://something.net/path/test.php");
测试输出...
Queries
http://something.net/path/test.php?query=1
http://something.net/path/test.php?query=1
Anchors
http://something.net/path/test.php?a=1#newAnchores
http://something.net/path/test.php?a=1#newAnchores
Path
http://something.net/testother.php
http://something.net/folder1/testother.php
http://something.net/folder1/folder2/testother.php
http://something.net/folder1/folder2/folder3/testother.php
http://something.net/folder1/folder2/folder3/testother.php
Url begins with //
https://google.com/path/
http://google.com/path/