正则表达式:捕获不包含正则表达式的组

时间:2018-10-31 08:08:41

标签: html regex

<DIV><SPAN CLASS="dt23 ll0">A suggestion: for the <SPAN CLASS="jl2">quickest</SPAN> overview of <SPAN CLASS="jl2">Mark</SPAN>, first read all the Division titles (I, II, III, etc.), then come back and read </SPAN></DIV>
    <DIV><SPAN CLASS="dt24 ll0">the individual outline titles. </SPAN></DIV>
    <DIV><SPAN CLASS="dt25 ll2"> </SPAN><SPAN></DIV>
    <DIV><SPAN CLASS="dt26 ll2"> </SPAN></DIV>
    <DIV><SPAN CLASS="dt27 ll2"> </SPAN></DIV>
    <DIV><SPAN CLASS="jl4">UTLINE OF </SPAN>M<SPAN CLASS="jl4">ARK</SPAN> </SPAN></DIV>
    <DIV><SPAN CLASS="dt29 ll2"> </SPAN></DIV>
    <DIV><SPAN CLASS="dt30 ll2"> </SPAN></DIV>

我试图在这里检索整个SPAN元素,而不捕获另一个SPAN的open标签。此正则表达式显然会失败

<SPAN.*?>(.*?)<\/SPAN>

上面的正则表达式的示例结果是这样的:

<SPAN CLASS="ps23 ft0">A suggestion: for the <SPAN CLASS="em2">quickest</SPAN>

这是不可取的。到目前为止,我为实现此目的而编码的正则表达式是:

<SPAN.*?>(.*?(?!<SPAN>.*?).)<\/SPAN>

惨败

1 个答案:

答案 0 :(得分:0)

请勿在HTML上使用RegEx。使用DOM操作

const spans = [...document.querySelectorAll("span")];
const spanContent = spans.map((span) => span.textContent);

console.log(spans)
console.log(spanContent)
<DIV>
  <SPAN CLASS="dt23 ll0">A suggestion: for the <SPAN CLASS="jl2">quickest</SPAN> overview of
  <SPAN CLASS="jl2">Mark</SPAN>, first read all the Division titles (I, II, III, etc.), then come back and read </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt24 ll0">the individual outline titles. </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt25 ll2"> </SPAN>
  <SPAN></DIV>
    <DIV><SPAN CLASS="dt26 ll2"> </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt27 ll2"> </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="jl4">UTLINE OF </SPAN>M
  <SPAN CLASS="jl4">ARK</SPAN> </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt29 ll2"> </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt30 ll2"> </SPAN>
</DIV>