使用正则表达式删除所有html属性(替换)

时间:2015-03-09 13:29:42

标签: javascript html regex

例如我有这样的html:

<title>Ololo - text’s life</title><div class="page-wrap"><div class="ng-scope"><div class="modal custom article ng-scope in" id="new-article" aria-hidden="false" style="display: block;"><div class="modal-dialog first-modal-wrapper">< div class="modal-content"><div class="modal-body full long"><div class="form-group">olololo<ul style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);"><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);">bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div class="page-wrap"></div></div>

我怎么能从这样的html中删除所有样式类id等?

我有这样的正则表达式:

/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i

有什么不对?如何在正则表达式的帮助下删除所有html属性?

这里是小提琴:

http://jsfiddle.net/qL4maxn0/1/

3 个答案:

答案 0 :(得分:4)

你不应该在这里使用正则表达式。

var html = '<title>Ololo - text’s life</title><div class="page-wrap"><div class="ng-scope"><div class="modal custom article ng-scope in" id="new-article" aria-hidden="false" style="display: block;"><div class="modal-dialog first-modal-wrapper"><div class="modal-content"><div class="modal-body full long">                        <div class="form-group">olololo<ul style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);"><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li>                            </ul><p style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);">bbcvbcvbcvbcvbcvbcvbcvb</p></div><div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div class="page-wrap"></div></div>';
var div = document.createElement('div');
div.innerHTML = html;

function removeAllAttrs(element) {
    for (var i = element.attributes.length; i-- > 0;)
    element.removeAttributeNode(element.attributes[i]);
}

function removeAttributes(el) {
    var children = el.children;
    for (var i = 0; i < children.length; i++) {
        var child = children[i];
        removeAllAttrs(child);
        if (child.children.length) {
            removeAttributes(child);
        }
    }
}
removeAttributes(div);
console.log(div.innerHTML);

Working Fiddle

Source

答案 1 :(得分:1)

您错过了g标记以使替换成为全局。

/<([a-z][a-z0-9]*)[^>]*?(\/?)>/ig

此外,如果您出于安全考虑,请考虑使用正确的HTML清理程序:Sanitize/Rewrite HTML on the Client Side

答案 2 :(得分:1)

首先,我建议你在这种情况下不要使用正则表达式,它们并不是要解析像HTML这样的树形结构。

如果您没有选择,我认为对于所请求的问题,您可以使用正则表达式。

在我看来你忘记了空格,重音等。你可以使用大于>且小于<符号的事实不允许作为原始文本。

/<\s*([a-z][a-z0-9]*)\s.*?>/gi

并将其命名为:

result = body.replace(regex, '<$1>')

对于您的给定样本,它会产生:

<title>Ololo - text’s life</title><div><div><div><div><div><div><div>olololo<ul><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p>bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div></div></div>