修改DOM并替换邮件地址

时间:2017-06-23 08:53:53

标签: javascript php html dom simpledom

我想利用Simple HTML DOM parser在html网站的内容中搜索邮件地址并替换它们。

替换包含span元素和一点JS(这应该模糊地址。

目前的工作原理如下:

        $pattern =
            "/(?:[a-z0-9!#$%&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/";

        preg_match_all( $pattern, $content, $matches );

        foreach ( $matches[ 0 ] as $email ) {
             $content = $this->searchDOM(
                $content,
                $email,
                $this->hide_email($email)
            );
        }

这是searchDOM - 方法:

private function searchDOM( $content, $search, $replace, $excludedParents = [] )
{
    $dom = HtmlDomParser::str_get_html(
        $content,
        true,
        true,
        DEFAULT_TARGET_CHARSET,
        false,
        DEFAULT_BR_TEXT,
        DEFAULT_SPAN_TEXT
    );

    foreach ( $dom->find( 'text' ) as $element ) {

        if ( !in_array( $element->parent()->tag, $excludedParents ) ) {
            $element->innertext = preg_replace(
                '/(?<!\w)' . preg_quote( $search, "/" ) . '(?!\w)/i',
                $replace,
                $element->innertext
            );
        }
    }

    return $dom->save();
}

这是hide_email-method:

function hide_email( $email )

{
    $character_set = '+-.0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz';

    $key         = str_shuffle( $character_set );
    $cipher_text = '';
    $id          = 'e' . rand( 1, 999999999 );

    for ( $i = 0; $i < strlen( $email ); $i += 1 )
        $cipher_text .= $key[ strpos( $character_set, $email[ $i ] ) ];

    $script = 'var a="' . $key . '";var b=a.split("").sort().join("");var c="' . $cipher_text . '";var d="";';

    $script .= 'for(var e=0;e<c.length;e++)d+=b.charAt(a.indexOf(c.charAt(e)));';

    $script .= 'document.getElementById("' . $id . '").innerHTML="<a href=\\"mailto:"+d+"\\">"+d+"</a>"';

    $script = "eval(\"" . str_replace( [ "\\", '"' ], [ "\\\\", '\"' ], $script ) . "\")";

    $script = '<script type="text/javascript">/*<![CDATA[*/' . $script . '/*]]>*/</script>';

    return '<span id="' . $id . '">[javascript protected email address]</span>' . $script;

}

嗯 - 这没有按预期工作,因为呈现的页面仅显示&#34; [javascript受保护的电子邮件地址]&#34;。如果我查看来源,则会遗漏a - 标记。

enter image description here

0 个答案:

没有答案