这是我的代码,它部分基于几个不同的代码,如果用Google搜索,您可以在各个地方轻松找到。我试图计算任何网页上的内部和外部链接,所有链接和(TO DO .nofollow)链接。这就是我现在所拥有的。大多数结果都是正确的,但是一些泛型调用给了我一些奇怪的结果,我仍然需要做.nofollow和_blank。如果你想用一些逻辑解释来评论或添加/改变任何东西,那么请这样做,非常感谢。
<?php
// transform to absolute path function...
function path_to_absolute($rel, $base)
{
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
/* queries and anchors */
if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract(parse_url($base));
/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);
/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';
/* dirty absolute URL */
$abs = "$host$path/$rel";
/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
/* absolute URL is ready! */
return $scheme.'://'.$abs;
}
// count zero begins
$intnumLinks = 0;
$extnumLinks = 0;
$nfnumLinks = 0;
$allnumLinks = 0;
// get url file
$url = $_REQUEST['url'];
// get contents of url file
$html = file_get_contents($url);
// http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php
// loading DOM document
$doc=new DOMDocument();
@$doc->loadHTML($html);
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$strings=$xml->xpath('//a');
foreach ($strings as $string) {
$aa = path_to_absolute( $string[href], $url, true );
$a = parse_url($aa, PHP_URL_HOST);
$a = str_replace("www.", "", $a);
$b = parse_url($url, PHP_URL_HOST);
if($a == $b){
echo 'call-host: ' . $b . '<br>';
echo 'type: int </br>';
echo 'title: ' . $string[0] . '<br>';
echo 'url: ' . $string['href'] . '<br>';
echo 'host: ' . $a . '<br><br>';
$intnumLinks++;
}else{
echo 'call-host: ' . $b . '<br>';
echo 'type: ext </br>';
echo 'title: ' . $string[0] . '<br>';
echo 'url: ' . $string['href'] . '<br>';
echo 'host: ' . $a . '<br><br>';
$extnumLinks++;
}
$allnumLinks++;
}
// count results
echo "<br>";
echo "Count int: $intnumLinks <br>";
echo "Count ext: $extnumLinks <br>";
echo "Count nf: $nfnumLinks <br>";
echo "Count all: $allnumLinks <br>";
?>
将这篇文章视为已关闭。起初我想删除这篇文章,但是有人可能会将这段代码用于他的工作。