以下功能旨在将rel="nofollow"
属性应用于所有外部链接而不包含内部链接,除非该路径与下面定义为$my_folder
的预定义根URL匹配。
所以给出变量......
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
内容......
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com">external</a>
最终结果,更换后应该......
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator" rel="nofollow">internal cloaked link</a>
<a href="http://cnn.com" rel="nofollow">external</a>
请注意,第一个链接不会被更改,因为它是一个内部链接。
第二行上的链接也是内部链接,但由于它与我们的$my_folder
字符串匹配,因此它也会获得nofollow
。
第三个链接最简单,因为它与blog_url
不匹配,它显然是一个外部链接。
但是,在下面的脚本中,我的所有链接都获得nofollow
。如何修复脚本以执行我想要的操作?
function save_rseo_nofollow($content) {
$my_folder = $rseo['nofollow_folder'];
$blog_url = get_bloginfo('url');
preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
for ( $i = 0; $i <= sizeof($matches[0]); $i++){
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& (preg_match('~' . $my_folder . '~', $matches[0][$i])
|| !preg_match( '~'.$blog_url.'~',$matches[0][$i]))){
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
return $content;
}
答案 0 :(得分:14)
这是DOMDocument解决方案......
$str = '<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com" rel="me">external</a>
<a href="http://google.com">external</a>
<a href="http://example.com" rel="nofollow">external</a>
<a href="http://stackoverflow.com" rel="junk in the rel">external</a>
';
$dom = new DOMDocument();
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($str);
$a = $dom->getElementsByTagName('a');
$host = strtok($_SERVER['HTTP_HOST'], ':');
foreach($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
if (preg_match('/^https?:\/\/' . preg_quote($host, '/') . '/', $href)) {
continue;
}
$noFollowRel = 'nofollow';
$oldRelAtt = $anchor->attributes->getNamedItem('rel');
if ($oldRelAtt == NULL) {
$newRel = $noFollowRel;
} else {
$oldRel = $oldRelAtt->nodeValue;
$oldRel = explode(' ', $oldRel);
if (in_array($noFollowRel, $oldRel)) {
continue;
}
$oldRel[] = $noFollowRel;
$newRel = implode($oldRel, ' ');
}
$newRelAtt = $dom->createAttribute('rel');
$noFollowNode = $dom->createTextNode($newRel);
$newRelAtt->appendChild($noFollowNode);
$anchor->appendChild($newRelAtt);
}
var_dump($dom->saveHTML());
string(509) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com" rel="me nofollow">external</a>
<a href="http://google.com" rel="nofollow">external</a>
<a href="http://example.com" rel="nofollow">external</a>
<a href="http://stackoverflow.com" rel="junk in the rel nofollow">external</a>
</body></html>
"
答案 1 :(得分:9)
首先尝试使其更具可读性,然后才能使if
规则更复杂:
function save_rseo_nofollow($content) {
$content["post_content"] =
preg_replace_callback('~<(a\s[^>]+)>~isU', "cb2", $content["post_content"]);
return $content;
}
function cb2($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/hostgator"; // re-add quirky config here
$blog_url = "http://localhost/";
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
提供以下输出:
[post_content] =>
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator" rel=nofollow>internal cloaked link</a>
<a href="http://cnn.com" rel=nofollow>external</a>
原始代码中的问题可能是$ rseo,但未在任何地方声明。
答案 2 :(得分:7)
试试这个(PHP 5.3 +):
和代码:
function nofollow($html, $skip = null) {
return preg_replace_callback(
"#(<a[^>]+?)>#is", function ($mach) use ($skip) {
return (
!($skip && strpos($mach[1], $skip) !== false) &&
strpos($mach[1], 'rel=') === false
) ? $mach[1] . ' rel="nofollow">' : $mach[0];
},
$html
);
}
示例:
echo nofollow('<a href="link somewhere" rel="something">something</a>');
// will be same because it's already contains rel parameter
echo nofollow('<a href="http://www.cnn.com">something</a>'); // ad
// add rel="nofollow" parameter to anchor
echo nofollow('<a href="http://localhost">something</a>', 'localhost');
// skip this link as internall link
答案 3 :(得分:3)
使用正则表达式来正确完成这项工作会非常复杂。使用实际的解析器会更容易,例如DOM extension中的解析器。 DOM不是非常适合初学者,因此您可以使用DOM加载HTML,然后使用SimpleXML运行修改。它们由相同的库支持,因此很容易与另一个库一起使用。
以下是它的外观:
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
$html = '<html><body>
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com">external</a>
</body></html>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$sxe = simplexml_import_dom($dom);
// grab all <a> nodes with an href attribute
foreach ($sxe->xpath('//a[@href]') as $a)
{
if (substr($a['href'], 0, strlen($blog_url)) === $blog_url
&& substr($a['href'], 0, strlen($my_folder)) !== $my_folder)
{
// skip all links that start with the URL in $blog_url, as long as they
// don't start with the URL from $my_folder;
continue;
}
if (empty($a['rel']))
{
$a['rel'] = 'nofollow';
}
else
{
$a['rel'] .= ' nofollow';
}
}
$new_html = $dom->saveHTML();
echo $new_html;
正如您所看到的,它非常简短。根据您的需要,您可能希望使用preg_match()
代替strpos()
内容,例如:
// change the regexp to your own rules, here we match everything under
// "http://localhost/mytest/" as long as it's not followed by "go"
if (preg_match('#^http://localhost/mytest/(?!go)#', $a['href']))
{
continue;
}
当我第一次阅读问题时,我错过了OP中的最后一个代码块。我发布的代码(基本上是基于DOM的任何解决方案)更适合处理整个页面而不是HTML块。否则,DOM会尝试“修复”您的HTML并添加<body>
标记,DOCTYPE等...
答案 4 :(得分:0)
<?
$str='<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com">external</a>';
function test($x){
if (preg_match('@localhost/mytest/(?!go/)@i',$x[0])>0) return $x[0];
return 'rel="nofollow" '.$x[0];
}
echo preg_replace_callback('/href=[\'"][^\'"]+/i', 'test', $str);
?>
答案 5 :(得分:0)
这是另一个具有白名单选项并添加tagret Blank属性的解决方案。 并且还会在添加新属性之前检查是否已经存在rel属性。
function Add_Nofollow_Attr($Content, $Whitelist = [], $Add_Target_Blank = true)
{
$Whitelist[] = $_SERVER['HTTP_HOST'];
foreach ($Whitelist as $Key => $Link)
{
$Host = preg_replace('#^https?://#', '', $Link);
$Host = "https?://". preg_quote($Host, '/');
$Whitelist[$Key] = $Host;
}
if(preg_match_all("/<a .*?>/", $Content, $matches, PREG_SET_ORDER))
{
foreach ($matches as $Anchor_Tag)
{
$IS_Rel_Exist = $IS_Follow_Exist = $IS_Target_Blank_Exist = $Is_Valid_Tag = false;
if(preg_match_all("/(\w+)\s*=\s*['|\"](.*?)['|\"]/",$Anchor_Tag[0],$All_matches2))
{
foreach ($All_matches2[1] as $Key => $Attr_Name)
{
if($Attr_Name == 'href')
{
$Is_Valid_Tag = true;
$Url = $All_matches2[2][$Key];
// bypass #.. or internal links like "/"
if(preg_match('/^\s*[#|\/].*/', $Url))
{
continue 2;
}
foreach ($Whitelist as $Link)
{
if (preg_match("#$Link#", $Url)) {
continue 3;
}
}
}
else if($Attr_Name == 'rel')
{
$IS_Rel_Exist = true;
$Rel = $All_matches2[2][$Key];
preg_match("/[n|d]ofollow/", $Rel, $match, PREG_OFFSET_CAPTURE);
if( count($match) > 0 )
{
$IS_Follow_Exist = true;
}
else
{
$New_Rel = 'rel="'. $Rel . ' nofollow"';
}
}
else if($Attr_Name == 'target')
{
$IS_Target_Blank_Exist = true;
}
}
}
$New_Anchor_Tag = $Anchor_Tag;
if(!$IS_Rel_Exist)
{
$New_Anchor_Tag = str_replace(">",' rel="nofollow">',$Anchor_Tag);
}
else if(!$IS_Follow_Exist)
{
$New_Anchor_Tag = preg_replace("/rel=[\"|'].*?[\"|']/",$New_Rel,$Anchor_Tag);
}
if($Add_Target_Blank && !$IS_Target_Blank_Exist)
{
$New_Anchor_Tag = str_replace(">",' target="_blank">',$New_Anchor_Tag);
}
$Content = str_replace($Anchor_Tag,$New_Anchor_Tag,$Content);
}
}
return $Content;
}
要使用它:
$Page_Content = '<a href="http://localhost/">internal</a>
<a href="http://yoursite.com">internal</a>
<a href="http://google.com">google</a>
<a href="http://example.com" rel="nofollow">example</a>
<a href="http://stackoverflow.com" rel="random">stackoverflow</a>';
$Whitelist = ["http://yoursite.com","http://localhost"];
echo Add_Nofollow_Attr($Page_Content,$Whitelist,true);
答案 6 :(得分:0)
感谢@alex提供出色的解决方案。但是,我在日语文字方面遇到了问题。我已经按照以下方式修复了它。另外,此代码可以使用SyntaxError: Unexpected token '<'
SyntaxError: Unexpected token '<'
数组跳过多个域。
$whiteList
答案 7 :(得分:0)
WordPress 决定:
function replace__method($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/articles"; // re-add quirky config here
$blog_url = 'https://'.$_SERVER['SERVER_NAME'];
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
add_filter( 'the_content', 'add_nofollow_to_external_links', 1 );
function add_nofollow_to_external_links( $content ) {
$content = preg_replace_callback('~<(a\s[^>]+)>~isU', "replace__method", $content);
return $content;
}
答案 8 :(得分:-1)
一个允许自动添加nofollow并保留其他属性的好脚本
function nofollow(string $html, string $baseUrl = null) {
return preg_replace_callback(
'#<a([^>]*)>(.+)</a>#isU', function ($mach) use ($baseUrl) {
list ($a, $attr, $text) = $mach;
if (preg_match('#href=["\']([^"\']*)["\']#', $attr, $url)) {
$url = $url[1];
if (is_null($baseUrl) || !str_starts_with($url, $baseUrl)) {
if (preg_match('#rel=["\']([^"\']*)["\']#', $attr, $rel)) {
$relAttr = $rel[0];
$rel = $rel[1];
}
$rel = 'rel="' . ($rel ? (strpos($rel, 'nofollow') ? $rel : $rel . ' nofollow') : 'nofollow') . '"';
$attr = isset($relAttr) ? str_replace($relAttr, $rel, $attr) : $attr . ' ' . $rel;
$a = '<a ' . $attr . '>' . $text . '</a>';
}
}
return $a;
},
$html
);
}