我正在使用此循环获取页面的所有ahref:
foreach($html->find('a[href!="#"]') as $ahref) {
$ahrefs++;
}
我想做这样的事情:
foreach($html->find('a[href!="#"]') as $ahref) {
if(isexternal($ahref)) {
$external++;
}
$ahrefs++;
}
外在的地方是一个功能
function isexternal($url) {
// FOO...
// Test if link is internal/external
if(/*condition is true*/) {
return true;
}
else {
return false;
}
}
帮助!
答案 0 :(得分:15)
使用parse_url并将主机与您的本地主机进行比较(通常但不总是与$_SERVER['HTTP_HOST']
相同)
function isexternal($url) {
$components = parse_url($url);
return !empty($components['host']) && strcasecmp($components['host'], 'example.com'); // empty host will indicate url like '/relative.php'
}
Hovewer这将把www.example.com和example.com视为不同的主机。如果您希望将所有子域都视为本地链接,那么该函数将会更大一些:
function isexternal($url) {
$components = parse_url($url);
if ( empty($components['host']) ) return false; // we will treat url like '/relative.php' as relative
if ( strcasecmp($components['host'], 'example.com') === 0 ) return false; // url host looks exactly like the local host
return strrpos(strtolower($components['host']), '.example.com') !== strlen($components['host']) - strlen('.example.com'); // check if the url host is a subdomain
}
答案 1 :(得分:0)
function isexternal($url) {
// FOO...
// Test if link is internal/external
if(strpos($url,'domainname.com') !== false || strpos($url,"/") === '0')
{
return true;
}
else
{
return false;
}
}
答案 2 :(得分:0)
我知道这篇文章很老但是我现在编写的功能。也许其他人也需要它。
function IsResourceLocal($url){
if( empty( $url ) ){ return false; }
$urlParsed = parse_url( $url );
$host = $urlParsed['host'];
if( empty( $host ) ){
/* maybe we have a relative link like: /wp-content/uploads/image.jpg */
/* add absolute path to begin and check if file exists */
$doc_root = $_SERVER['DOCUMENT_ROOT'];
$maybefile = $doc_root.$url;
/* Check if file exists */
$fileexists = file_exists ( $maybefile );
if( $fileexists ){
/* maybe you want to convert to full url? */
return true;
}
}
/* strip www. if exists */
$host = str_replace('www.','',$host);
$thishost = $_SERVER['HTTP_HOST'];
/* strip www. if exists */
$thishost = str_replace('www.','',$thishost);
if( $host == $thishost ){
return true;
}
return false;
}
答案 3 :(得分:0)
这是您可以简单地检测外部URL的方法:
$url = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';
$internal = (
false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
stripos( $url, '.' . $domain ) || // include subdomains, like "www.my-domain.com". DANGEROUS (see below)!
(
0 !== strpos( $url, '//' ) && // exclude protocol relative URLs, like "//example.com"
0 === strpos( $url, '/' ) // include root-relative URLs, like "/demo"
)
);
上述检查会将www.my-domain.com
和my-domain.com
视为“内部”。
为什么此规则很危险:
子域逻辑引入了一个可以利用的弱点:例如,当外部URL在路径中包含您的域时,https://external.com/www.my-domain.com
被视为内部!
更安全的密码:
可以通过删除子域支持(我建议这样做)来消除此问题:
$url = 'https://my-domain.com/demo/';
$domain = 'my-domain.com';
$internal = (
false !== stripos( $url, '//' . $domain ) || // include "//my-domain.com" and "http://my-domain.com"
(
0 !== strpos( $url, '//' ) && // exclude protocol relative URLs, like "//example.com"
0 === strpos( $url, '/' ) // include root-relative URLs, like "/demo"
)
);
答案 4 :(得分:-1)
您可能想要检查链接是否在同一个域中。只有当所有href属性都是绝对属性并包含域时,这才有效。像/test/file.html这样的相关文件很棘手,因为可以拥有与域名相同的文件夹。所以,如果你在每个链接中都有完整的URL:
function isexternal($url) {
// Test if link is internal/external
if(stristr($url, "myDomain.com") || strpos($url,"/") == '0')
return true;
else
return false;
}