使用PHP从网页获取内部和外部链接计数

时间:2016-12-14 21:19:12

标签: php

这是我的代码,它部分基于几个不同的代码,如果用Google搜索,您可以在各个地方轻松找到。我试图计算任何网页上的内部和外部链接,所有链接和(TO DO .nofollow)链接。这就是我现在所拥有的。大多数结果都是正确的,但是一些泛型调用给了我一些奇怪的结果,我仍然需要做.nofollow和_blank。如果你想用一些逻辑解释来评论或添加/改变任何东西,那么请这样做,非常感谢。

<?php

    // transform to absolute path function... 
    function path_to_absolute($rel, $base)
  {
    /* return if already absolute URL */
    if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
    /* queries and anchors */
    if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
    /* parse base URL and convert to local variables:
       $scheme, $host, $path */
    extract(parse_url($base));
    /* remove non-directory element from path */
    $path = preg_replace('#/[^/]*$#', '', $path);
    /* destroy path if relative url points to root */
    if ($rel[0] == '/') $path = '';
    /* dirty absolute URL */
    $abs = "$host$path/$rel";
    /* replace '//' or '/./' or '/foo/../' with '/' */
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
    /* absolute URL is ready! */
    return $scheme.'://'.$abs;
  }


// count zero begins 
$intnumLinks = 0;
$extnumLinks = 0;
$nfnumLinks = 0;
$allnumLinks = 0;

// get url file
$url = $_REQUEST['url'];
// get contents of url file
$html = file_get_contents($url);
// http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php
// loading DOM document
$doc=new DOMDocument();
@$doc->loadHTML($html);

$xml=simplexml_import_dom($doc); // just to make xpath more simple
$strings=$xml->xpath('//a');
foreach ($strings as $string) {



    $aa = path_to_absolute( $string[href], $url, true );
    $a = parse_url($aa, PHP_URL_HOST);
    $a = str_replace("www.", "", $a);

    $b = parse_url($url, PHP_URL_HOST);

    if($a == $b){
    echo 'call-host: ' . $b . '<br>';
    echo 'type: int </br>';
    echo 'title: ' . $string[0] . '<br>';
    echo 'url: ' . $string['href'] . '<br>';
    echo 'host: ' . $a . '<br><br>';
    $intnumLinks++;
    }else{
    echo 'call-host: ' . $b . '<br>';
    echo 'type: ext </br>';
    echo 'title: ' . $string[0] . '<br>';
    echo 'url: ' . $string['href'] . '<br>';
    echo 'host: ' . $a . '<br><br>';
    $extnumLinks++;
    }
    $allnumLinks++;

}

// count results 
echo "<br>";
echo "Count int: $intnumLinks <br>";
echo "Count ext: $extnumLinks <br>";
echo "Count nf: $nfnumLinks <br>";
echo "Count all: $allnumLinks <br>";
?>

将这篇文章视为已关闭。起初我想删除这篇文章,但是有人可能会将这段代码用于他的工作。

0 个答案:

没有答案