如何使用PHP检查URL是外部URL还是内部URL?

时间:2014-04-09 13:46:06

标签: php html backend

我正在使用此循环获取页面的所有ahref:

foreach($html->find('a[href!="#"]') as $ahref) {
    $ahrefs++;
}

我想做这样的事情:

foreach($html->find('a[href!="#"]') as $ahref) {
    if(isexternal($ahref)) {
        $external++;
    }
    $ahrefs++;
}

外在的地方是一个功能

function isexternal($url) {
    // FOO...

    // Test if link is internal/external
    if(/*condition is true*/) {
        return true;
    }
    else {
        return false;
    }
}

帮助!

5 个答案:

答案 0 :(得分:15)

使用parse_url并将主机与您的本地主机进行比较(通常但不总是与$_SERVER['HTTP_HOST']相同)

function isexternal($url) {
  $components = parse_url($url);    
  return !empty($components['host']) && strcasecmp($components['host'], 'example.com'); // empty host will indicate url like '/relative.php'
}

Hovewer这将把www.example.com和example.com视为不同的主机。如果您希望将所有子域都视为本地链接,那么该函数将会更大一些:

function isexternal($url) {
  $components = parse_url($url);
  if ( empty($components['host']) ) return false;  // we will treat url like '/relative.php' as relative
  if ( strcasecmp($components['host'], 'example.com') === 0 ) return false; // url host looks exactly like the local host
  return strrpos(strtolower($components['host']), '.example.com') !== strlen($components['host']) - strlen('.example.com'); // check if the url host is a subdomain
}

答案 1 :(得分:0)

function isexternal($url) {
    // FOO...

    // Test if link is internal/external
    if(strpos($url,'domainname.com') !== false || strpos($url,"/") === '0') 
    {
         return true;
    }
    else 
    {
         return false;
    }
}

答案 2 :(得分:0)

我知道这篇文章很老但是我现在编写的功能。也许其他人也需要它。

function IsResourceLocal($url){
    if( empty( $url ) ){ return false; }
    $urlParsed = parse_url( $url );
    $host = $urlParsed['host'];
    if( empty( $host ) ){ 
    /* maybe we have a relative link like: /wp-content/uploads/image.jpg */
    /* add absolute path to begin and check if file exists */
    $doc_root = $_SERVER['DOCUMENT_ROOT'];
    $maybefile = $doc_root.$url;
    /* Check if file exists */
    $fileexists = file_exists ( $maybefile );
    if( $fileexists ){
        /* maybe you want to convert to full url? */
        return true;        
        }
     }
    /* strip www. if exists */
    $host = str_replace('www.','',$host);
    $thishost = $_SERVER['HTTP_HOST'];
    /* strip www. if exists */
    $thishost = str_replace('www.','',$thishost);
    if( $host == $thishost ){
        return true;
        }
    return false;
}

答案 3 :(得分:0)

这是您可以简单地检测外部URL的方法:

$url    = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';

$internal = (
    false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
    stripos( $url, '.' . $domain ) ||            // include subdomains, like "www.my-domain.com". DANGEROUS (see below)!
    (
        0 !== strpos( $url, '//' ) &&            // exclude protocol relative URLs, like "//example.com"
        0 === strpos( $url, '/' )                // include root-relative URLs, like "/demo"
    )
);

上述检查会将www.my-domain.commy-domain.com视为“内部”。

为什么此规则很危险

子域逻辑引入了一个可以利用的弱点:例如,当外部URL在路径中包含您的域时,https://external.com/www.my-domain.com被视为内部!

更安全的密码

可以通过删除子域支持(我建议这样做)来消除此问题:

$url    = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';

$internal = (
    false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
    (
        0 !== strpos( $url, '//' ) &&            // exclude protocol relative URLs, like "//example.com"
        0 === strpos( $url, '/' )                // include root-relative URLs, like "/demo"
    )
);

答案 4 :(得分:-1)

您可能想要检查链接是否在同一个域中。只有当所有href属性都是绝对属性并包含域时,这才有效。像/test/file.html这样的相关文件很棘手,因为可以拥有与域名相同的文件夹。所以,如果你在每个链接中都有完整的URL:

function isexternal($url) {

  // Test if link is internal/external
  if(stristr($url, "myDomain.com") || strpos($url,"/") == '0')
    return true;
  else
    return false;
}