我想利用Simple HTML DOM parser在html网站的内容中搜索邮件地址并替换它们。
替换包含span
元素和一点JS(这应该模糊地址。
目前的工作原理如下:
$pattern =
"/(?:[a-z0-9!#$%&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/";
preg_match_all( $pattern, $content, $matches );
foreach ( $matches[ 0 ] as $email ) {
$content = $this->searchDOM(
$content,
$email,
$this->hide_email($email)
);
}
这是searchDOM
- 方法:
private function searchDOM( $content, $search, $replace, $excludedParents = [] )
{
$dom = HtmlDomParser::str_get_html(
$content,
true,
true,
DEFAULT_TARGET_CHARSET,
false,
DEFAULT_BR_TEXT,
DEFAULT_SPAN_TEXT
);
foreach ( $dom->find( 'text' ) as $element ) {
if ( !in_array( $element->parent()->tag, $excludedParents ) ) {
$element->innertext = preg_replace(
'/(?<!\w)' . preg_quote( $search, "/" ) . '(?!\w)/i',
$replace,
$element->innertext
);
}
}
return $dom->save();
}
这是hide_email-method:
function hide_email( $email )
{
$character_set = '+-.0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz';
$key = str_shuffle( $character_set );
$cipher_text = '';
$id = 'e' . rand( 1, 999999999 );
for ( $i = 0; $i < strlen( $email ); $i += 1 )
$cipher_text .= $key[ strpos( $character_set, $email[ $i ] ) ];
$script = 'var a="' . $key . '";var b=a.split("").sort().join("");var c="' . $cipher_text . '";var d="";';
$script .= 'for(var e=0;e<c.length;e++)d+=b.charAt(a.indexOf(c.charAt(e)));';
$script .= 'document.getElementById("' . $id . '").innerHTML="<a href=\\"mailto:"+d+"\\">"+d+"</a>"';
$script = "eval(\"" . str_replace( [ "\\", '"' ], [ "\\\\", '\"' ], $script ) . "\")";
$script = '<script type="text/javascript">/*<![CDATA[*/' . $script . '/*]]>*/</script>';
return '<span id="' . $id . '">[javascript protected email address]</span>' . $script;
}
嗯 - 这没有按预期工作,因为呈现的页面仅显示&#34; [javascript受保护的电子邮件地址]&#34;。如果我查看来源,则会遗漏a
- 标记。