使用Regex删除HTML标签和换行符

时间:2018-08-27 11:47:22

标签: javascript regex

我想用<br>标签替换html标签和换行符。为此,我使用了以下代码,但它不能代替\r\n

const newText = text.replace(/<script.*?<\/script>/g, '<br>')
  .replace(/<style.*?<\/style>/g, '<br>')
  .replace(/(<([^>]+)>)/ig, "<br>")
  .replace(/(?:\r\n|\r|\n)/g, '<br>')


文字示例

<div class="text-danger ng-binding" ng-bind-html="message.causedBy ">javax.xml.ws.soap.SOAPFaultException: Response was of unexpected text/html ContentType.  Incoming portion of HTML stream: \r\n\r\n\r\n\r\n500 - Internal server error.\r\n\r\n\r\n\r\n<div><h1>Server Error</h1></div>\r\n<div>\r\n <div class="\&quot;content-container\&quot;">\r\n  <h2>500 - Internal server error.</h2>\r\n  <h3>There is a problem with the resource you are looking for, and it cannot be displayed.</h3>\r\n </div>\r\n</div>\r\n\r\n\r\n\n\t</div>

感谢您的帮助。 (:

3 个答案:

答案 0 :(得分:1)

这对我有用。您的CRLF是“ \ r”一个转义字符还是两个字符,分别是“ \”和“ r”。

如果您具有带有\ n和\ r字符的HTML元素,则它们是文字的,并且在div中除非您显示源代码,否则这真的很奇怪。普通的ol'换行符将按预期以单个转义字符结束。

此外,也不清楚源是从元素中提取还是静态文本。

您可能必须在正则表达式中转义字面大小写。

replace(/(?:\\r\\n|\\r|\\n)/g, '<br>')

const text = `
<div class="text-danger ng-binding" ng-bind-html="message.causedBy ">javax.xml.ws.soap.SOAPFaultException: Response was of unexpected text/html ContentType.  Incoming portion of HTML stream: \r\n\r\n\r\n\r\n500 - Internal server error.\r\n\r\n\r\n\r\n<div><h1>Server Error</h1></div>\r\n<div>\r\n <div class="\&quot;content-container\&quot;">\r\n  <h2>500 - Internal server error.</h2>\r\n  <h3>There is a problem with the resource you are looking for, and it cannot be displayed.</h3>\r\n </div>\r\n</div>\r\n\r\n\r\n\n\t</div>`

const newText = text
  .replace(/<script.*?<\/script>/g, '<br>')
  .replace(/<style.*?<\/style>/g, '<br>')
  .replace(/(<([^>]+)>)/ig, "<br>")
  .replace(/(?:\r\n|\r|\n)/g, '<br>')
  //.replace(/(?:\\r\\n|\\r|\\n)/g, '<br>')
console.log(newText)

const text2 = document.getElementById('text').innerHTML
const newText2 = text2
  .replace(/<script.*?<\/script>/g, '<br>')
  .replace(/<style.*?<\/style>/g, '<br>')
  .replace(/(<([^>]+)>)/ig, "<br>")
  .replace(/(?:\r\n|\r|\n)/g, '<br>')
  //.replace(/(?:\\r\\n|\\r|\\n)/g, '<br>')
console.log(newText2)
<div id='text'>
This

is

<script>// nothing here </script>

a

div

These are literal \r\n\r\n and will not get escaped unless you uncomment the special case.

</div>

答案 1 :(得分:1)

  

您无法使用正则表达式解析[X] HTML。因为正则表达式无法解析HTML。正则表达式不是可用于正确解析HTML的工具。

And so on.

相反,您可以轻松使用解析器。使用它!

var tmp = document.createElement('div');
tmp.innerHTML = text;

// replace all start/end tags with <br> for... some reason, I guess!
Array.from(tmp.getElementsByTagName("*")).forEach(function(elem) {
    // ignore <br> tags
    if( elem.nodeName.match(/^br$/i)) {
        // do nothing
    }
    // outright remove <script> and <style>
    else if( elem.nodeName.match(/^(?:script|style)$/i)) {
        elem.parentNode.replaceChild(document.createElement('br'), elem);
    }
    // replace element with its contents and place a <br> before and after
    else {
        elem.parentNode.insertBefore(document.createElement('br'), elem);
        while(elem.firstChild) {
            elem.parentNode.insertBefore(elem.firstChild, elem);
        }
        elem.parentNode.replaceChild(document.createElement('br'), elem);
    }
});

var html = tmp.innerHTML;
// since replacing newlines with <br> is a string operation, go ahead and use regex for that
html = html.replace(/\r?\n/,"<br />");

答案 2 :(得分:0)

只需用空字符串替换与模式(<[^>]+>|\r|\n)匹配的所有内容。

这是简单的交替,其中\r是回车符,\n是换行符(因此,它肯定会删除有时是\r和{{1}的组合的所有换行符}。

\n将匹配每个HTML标签。