PHP:检测并更正HTML中的传出链接

时间:2018-07-13 13:28:16

标签: php html hyperlink autocorrect nofollow

我需要一个函数来纠正给定HTML文本中的所有传出链接,并将属性“ rel = nofollow”添加到链接中。仅传出的链接应得到纠正。

示例:我的域名是www.laptops.com

$myDomain = "www.laptops.com";

$html = 
 "Hello World have a look at <a href="www.laptops.com/apple">Apple Laptops</a>. 
  For more ino go to <a href="www.apple.com">Apple.com</a> 
  or to <a href="www.appleblog.com">Appleblog.com</a>";

function correct($html,$myDomain){ 
    //get all links by filtering '<a ... href="{$link}" .....>' and 
    //check with isOutgoing($href,$myDomain )
}

$newHTML = correct($html,$myDomain);

echo $newHTML;

//Hello World have a look at <a href="www.laptops.com/apple">Apple Laptops</a>. 
//For more ino go to <a rel="nofollow" href="www.apple.com">Apple.com</a> 
//or to <a rel="nofollow" href="www.appleblog.com">Appleblog.com</a> 

到目前为止,我有一个函数“ isOutgoing($ link)”,该函数可以检测链接是否传出,但是可以检测所有“ “部分HTML文本并过滤{$ link}会产生问题。我知道preg_match()应该可以,但是我不知道如何解决。

2 个答案:

答案 0 :(得分:2)

您应避免使用正则表达式,而应使用DOMDocumentDOMXPath

ADJP = 'ADJP'
ADVP = 'ADVP'
NUMBER = 'CD'
DET = 'DT'
PREP = 'IN'
ADJ = 'JJ'
ADJ_COMP = 'JJR'
ADJ_SUP = 'JJS'
MODAL = 'MD'
NOUN = 'NN'
NOUN_PROPER = 'NNP'
NOUN_PL = 'NNS'
NP = 'NP'
POSS = 'POS'
PP = 'PP'
PRONOUN = 'PRP'
PRONOUN_POSS = 'PRP$'
ADVERB = 'RB'
ROOT = 'ROOT'
SENTENCE = 'S'
SBAR = 'SBAR'
WH_QUESTION = 'SBARQ'
BIN_QUESTION = 'SQ'
TO = 'TO'
VERB_INF = 'VB'
VERB_PAST = 'VBD'
VERB_PLURAL = 'VBP'
VERB_3SG = 'VBZ'
VP = 'VP'
WHNP = 'WHNP'
WHADJP = 'WHADJP'
WHADVP = 'WHADVP'
WDT = 'WDT'
WP_POSS = 'WP$'
COMMA = ','
PERIOD = '.'

结果:

<?php
$dom = new DOMDocument();

$dom->loadHtml('
Hello World have a look at <a href="www.laptops.com/apple">Apple Laptops</a>. 
  For more ino go to <a href="www.apple.com">Apple.com</a> 
  or to <a href="www.appleblog.com">Appleblog.com</a>
', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

foreach ($xpath->query("//a") as $link) {
    $href = $link->getAttribute('href');

    // link does not have a www.laptops.com in it, add rel attribute
    if (strpos($href, 'www.laptops.com') === false) {
        $link->setAttribute("rel", "nofollow noopener");
    }
}

echo $dom->saveHTML();

https://3v4l.org/DseDi

答案 1 :(得分:0)

使用一些jQuery,这将变得更加容易。

<script type="text/javascript">
$(document).ready(function(){
    $('a').each(function(){
        let href = $(this).prop('href');
        if (!href.search('laptops.com')) {
            $(this).prop('rel', 'nofollow');
        }
    });
});
</script>