Question

我仍然习惯使用正则表达式，因此我不确定如何使它正常工作。

我没有使用jQuery，它不是当前的document，而是从另一个来源获取的HTML作为string。我不在乎 标记之外的标记，因此我想将其解析出来。我想保留标记内的标记，以保留其换行符。

我需要更改以下内容：

<body><br /><p>hello<br />there</p><br /></body>

对此：

<body><p>hello<br />there</p></body>

我将使用什么正则表达式来完成这项工作？

编辑：更多信息，我正在尝试使用Node.js进行此服务器端。因此，我无权访问DOMParser，但是我正在使用html-dom-parser。我正在解析这些外部
标记，然后再将其传递给该解析器以减少生成的DOM树对象。

Answer 1

您可以使用DOMPArser解析HTML内容，然后使用:not() pseudo-class selector获取不是Timer 0 Timer 1 Timer 2 Timer 3 Timer 4 Scheduled work 2 Started work 2 Scheduled work 0 Scheduled work 3 Ended work 2 Started work 0 Scheduled work 1 Scheduled work 4 Ended work 0 Started work 3 Ended work 3 Started work 1 Ended work 1 Started work 4 Ended work 4标签的所有标签，然后使用>(direct child selector)获取{{1 }}标记是它的直接子代（以避免嵌套）。

使用RegExp解析HTML是一个坏主意：

Using regular expressions to parse HTML: why not?

RegEx match open tags except XHTML self-contained tags

对于使用jsdom库的Node.js，它看起来很像

br

更新：：如果p标记内有嵌套let html = `<body> hello there </body>`; let parser = new DOMParser(); doc = parser.parseFromString(html, "text/html"); doc.querySelectorAll(':not(p) > br').forEach(ele => ele.remove()) console.log(doc.body.outerHTML)标记的机会，请在删除前检查祖先元素。

例如：

let html = `<body><br />
  <p>hello<br />there</p><br /></body>`;

const dom = new JSDOM(html);


dom.window.document.querySelectorAll(':not(p) > br').forEach(ele => ele.remove())

console.log(dom.window.document.body.outerHTML)

Answer 2

基于answer的Pranav C Balan：

代码<...>.querySelectorAll(':not(p) > br').forEach(ele => ele.remove()) 是dangerous，因为当前者本身嵌套在非 标签中时，它将删除中的所有。

let html = `<body><br>
  <p>hello <u>underline<br>underline</u><br>there </p><br></body>`;


let parser = new DOMParser();
doc = parser.parseFromString(html, "text/html");


doc.querySelectorAll(':not(p) > br').forEach(ele => ele.remove())

console.log(doc.body.outerHTML)

console.log(`This should've been:
<body>
  <p>hello <u>underline<br>underline</u><br>there </p></body>`)

要使其工作，我们需要获取所有 元素，并检查它们是否在元素内，是否作为直接后代。使用jQuery时，您将使用closest方法。我们可以使用如下所述的VanillaJS方法：PlainJS - Get closes element by selector

/** source: https://plainjs.com/javascript/traversing/get-closest-element-by-selector-39/ */
// matches polyfill
this.Element && function(ElementPrototype) {
    ElementPrototype.matches = ElementPrototype.matches ||
    ElementPrototype.matchesSelector ||
    ElementPrototype.webkitMatchesSelector ||
    ElementPrototype.msMatchesSelector ||
    function(selector) {
        var node = this, nodes = (node.parentNode || node.document).querySelectorAll(selector), i = -1;
        while (nodes[++i] && nodes[i] != node);
        return !!nodes[i];
    }
}(Element.prototype);

// closest polyfill
this.Element && function(ElementPrototype) {
    ElementPrototype.closest = ElementPrototype.closest ||
    function(selector) {
        var el = this;
        while (el.matches && !el.matches(selector)) el = el.parentNode;
        return el.matches ? el : null;
    }
}(Element.prototype);


let html = `<body><br>
  <p>hello <u>underline<br>underline</u><br>there </p><br></body>`;


let parser = new DOMParser();
doc = parser.parseFromString(html, "text/html");


doc.querySelectorAll(':not(p) > br').forEach(ele => {
    if (!ele.closest('p')) {
      ele.remove()
    }
  })

console.log(doc.body.outerHTML)
console.log(`That should be:
<body>
  <p>hello <u>underline<br>underline</u><br>there </p></body>`)

附录：

如果您需要在删除的 的位置放置空格，以防止将a b转换为ab而是a b，可以在内部使用此函数forEach

elm => {
    if (!elm.closest('p')) {
        elm.parentNode.insertBefore(document.createTextNode(' '), elm);
        elm.remove();
    }
}

如何使用正则表达式删除<p>标记内的所有<br/>标记？

2 个答案:

附录：