使用RegEx替换Body html中的数字

时间:2015-12-30 20:30:56

标签: javascript html

我想在我的网页中更改号码,我不想破坏页面的HTML。什么是正确的方法?

我已经阅读了这个答案:RegEx match open tags except XHTML self-contained tags

然而,有一个skype插件以某种方式替换网页中的数字。它是如何做到的?

这是我的代码:

var formats = '(xxx) xxx-xxxx|(xxx)xxx-xxxx|xxx-xxx-xxxx|xxx.xxx.xxxx|xxx xxx xxxx';
var str = '('+formats.replace(/([\(\)\+\-])/g, '\\$1').replace(/x/g,'\\d') + ')';

var r = RegExp(str,'g');
document.body.innerHTML=document.body.innerHTML.replace(r,'<a style="color:#07C !important; font-size:100% !important;" href="https://call.com/number=$1">$1</a>');

我面临的问题是它与身体标签属性相混淆,例如:

<a href="https://stackoverflow.com/a/4338544/1269037">validate phone numbers properly</a>

替换为损坏的html:

<a href="https://stackoverflow.com/a/&lt;a style=" color:#07c="" !important;="" font-size:100%="" !important;"="">4338544/1269</a>

和代码arround都搞砸了。

我认为RegEx模式定义不明确

1 个答案:

答案 0 :(得分:1)

使用正则表达式来解析和处理HTML代码是一项几乎不可能完成的任务。总有一些边界案例会被遗漏。

更合理的方法是使用文档对象模型并遍历所有文本节点,然后单独处理这些文本。如果匹配,请再次使用DOM添加链接元素。

这是一个使用treeWalker

的工作代码段

// Prepare search expression:
var formats = ['(xxx) xxx-xxxx',
               '(xxx)xxx-xxxx',
               'xxx-xxx-xxxx'];
var str = formats.join('|')         // split patterns by OR operator
    .replace(/[()+]/g, '\\$&')      // escape special characters
    .replace(/-/g, '[-. ]')         // hyphen can be space or dot as well
    .replace(/(^|[|])x/g, '$1\\bx') // require first digit to be start of a word
    .replace(/x($|[|])/g, 'x\\b$1') // require last digit to be end of a word
    .replace(/x/g, '\\d')           // set digit placeholders
;
var r = RegExp('(' + str + ')', '');                  
var node;
// create a walker for visiting all text nodes in the document
var walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT,
                                       null, false);
while (node = walker.nextNode()) {
    // Do not process SCRIPT, OPTION and some other tag contents
    // You might need to extend this black-list:
    if (node.parentNode.tagName.search(
            /SCRIPT|SELECT|OPTION|BUTTON|TEXTAREA/) === -1) {
        // split text of node into parts <non-phone><phone><non-phone>...
        var parts = node.nodeValue.split(r);
        while (parts.length > 1) {
            var txt = parts.shift();
            if (txt.length) {
                // insert a text node for the non-phone text:
                node.parentNode.insertBefore(document.createTextNode(txt), node);
            }
            // get phone number, create a link for it
            var phone = parts.shift();
            var a = document.createElement('a');
            // set hyperlink, and pass digits only as URL argument:
            a.setAttribute('href',
                           'https://call.com/number=' + phone.replace(/[^\d]/g, ''));
            a.setAttribute('style', 
                           'color:#07C !important; font-size:100% !important;');
            a.textContent = phone;
            // insert link into the document
            node.parentNode.insertBefore(a, node);
        }
        // reduce the original node to the ending non-phone part
        node.nodeValue = parts[0];
    };
}
This is a test. 
Following are valid:<br/>
<ul>
    <li>Please dial:473-299-8154</li>
    <li>or 678.269-1514, during weekends</li>
    <li>Private (732 939 8549)</li>
    <li>Back-up =(673) 137.4892</li>
</ul>
 Do not match any of these:<br/>
<ul>
    <li>a473-299-8154 because of a</li>
    <li>473-299-81549 because of last 9</li>
    <li>473/299.8154 because of slash</li>
</ul>
 Some elements whose content should not be parsed:
<form id="myform">
    <select id="sel">
        <option value="phone">123.456.7890</option>
    </select>
    <input  id="inp" type="text" value="123-321-1231">
    <button>123-321-1231</button><br/>
    <textarea>Links are not allowed in textareas:
123-321-1231</textarea>
</form>