Question

我有下一个代码：

 textResponse = textResponse.replace(/<head>(.|\n)*?<\/head\>/img, '');
 alert("Ups, Error " + jqxhr.status + ", " + textResponse);

它用于在ajax req上显示错误，文本响应包含响应页面错误的html，即时删除该页面中不必要的内容，所以我尝试从流动的字符串中删除<head>文本：

<!DOCTYPE html>
<html>
    <head>
        <title>No hay usuario logeado</title>
        <meta name="viewport" content="width=device-width" />
        <style>
         body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;} 
         p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}
         b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}
         H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }
         H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }
         pre {font-family:"Consolas","Lucida Console",Monospace;font-size:11pt;margin:0;padding:0.5em;line-height:14pt}
         .marker {font-weight: bold; color: black;text-decoration: none;}
         .version {color: gray;}
         .error {margin-bottom: 10px;}
         .expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }
         @media screen and (max-width: 639px) {
          pre { width: 440px; overflow: auto; white-space: pre-wrap; word-wrap: break-word; }
         }
         @media screen and (max-width: 479px) {
          pre { width: 280px; }
         }
        </style>
    </head>

    <body bgcolor="white">

            <span><H1>Error de servidor en la aplicación '/HMSW'.<hr width=100% size=1 color=silver></H1>

...

但字符串继续完全相同，没有删除。

任何想法为什么？

Answer 1

要包含换行符，请使用[\s\S]（＆＃34;空白+非空白＆＃34;）并且不要使用多行处理，因为它不会将输入文本处理为整个但是一行一行。全局标志是多余的，因为只有一个<head>。
```
textResponse = textResponse.replace(/<head>[\s\S]*?<\/head>/i, '');
```

更好的方法是将响应解析为DOM树并删除head节点。

优势在于解析器将正确处理可能已注释的重复<head>或</head>（例如<html><head>......<!-- </head> --!>.....</head>）。

使用适用于现代浏览器的DOMParser的示例：

var doc = new DOMParser().parseFromString(textResponse, "text/html");
doc.head.remove(); // Note: .head node is always present even if empty

然后可以使用document.importNode导入内容：

var container = document.querySelector(".container");
container.appendChild(document.importNode(doc.querySelector(".something"), true));

或者可以提取为html：doc.documentElement.outerHTML

P.S。如果XMLHttpRequest responseType设置为document，则可以跳过解析阶段：

xhr = new XMLHttpRequest();
xhr.responseType = "document";
xhr.open("GET", "http://someurl");
xhr.onload = function() {
    var doc = this.responseXML;
    doc.head.remove();
    ..................
};
xhr.send();

Answer 2

忽略正则表达式不适合解析HTML的事实，如果您只是找到<body>标记并关闭标记并选择其间的所有内容，则可以更容易地处理这种情况。只需做2 indexOf()并抓住以下内容：

＆＃13;

var fullHTMLStr = '<html><head>blablabla</head><body bgColor="white">Body!</body></html>';
var start = fullHTMLStr.indexOf('<body'); // don't look for '>', there might be attributes
var start = fullHTMLStr.indexOf('>', start + 4) + 1; // advance past '>'
var end = fullHTMLStr.indexOf('</body', start);

var justBody = fullHTMLStr.substring(start, end);

alert(justBody);

＆＃13;

JS上的RegEx没有替换

2 个答案: