<DIV><SPAN CLASS="dt23 ll0">A suggestion: for the <SPAN CLASS="jl2">quickest</SPAN> overview of <SPAN CLASS="jl2">Mark</SPAN>, first read all the Division titles (I, II, III, etc.), then come back and read </SPAN></DIV>
<DIV><SPAN CLASS="dt24 ll0">the individual outline titles. </SPAN></DIV>
<DIV><SPAN CLASS="dt25 ll2"> </SPAN><SPAN></DIV>
<DIV><SPAN CLASS="dt26 ll2"> </SPAN></DIV>
<DIV><SPAN CLASS="dt27 ll2"> </SPAN></DIV>
<DIV><SPAN CLASS="jl4">UTLINE OF </SPAN>M<SPAN CLASS="jl4">ARK</SPAN> </SPAN></DIV>
<DIV><SPAN CLASS="dt29 ll2"> </SPAN></DIV>
<DIV><SPAN CLASS="dt30 ll2"> </SPAN></DIV>
我试图在这里检索整个SPAN元素,而不捕获另一个SPAN的open标签。此正则表达式显然会失败
<SPAN.*?>(.*?)<\/SPAN>
上面的正则表达式的示例结果是这样的:
<SPAN CLASS="ps23 ft0">A suggestion: for the <SPAN CLASS="em2">quickest</SPAN>
这是不可取的。到目前为止,我为实现此目的而编码的正则表达式是:
<SPAN.*?>(.*?(?!<SPAN>.*?).)<\/SPAN>
惨败
答案 0 :(得分:0)
请勿在HTML上使用RegEx。使用DOM操作
const spans = [...document.querySelectorAll("span")];
const spanContent = spans.map((span) => span.textContent);
console.log(spans)
console.log(spanContent)
<DIV>
<SPAN CLASS="dt23 ll0">A suggestion: for the <SPAN CLASS="jl2">quickest</SPAN> overview of
<SPAN CLASS="jl2">Mark</SPAN>, first read all the Division titles (I, II, III, etc.), then come back and read </SPAN>
</DIV>
<DIV>
<SPAN CLASS="dt24 ll0">the individual outline titles. </SPAN>
</DIV>
<DIV>
<SPAN CLASS="dt25 ll2"> </SPAN>
<SPAN></DIV>
<DIV><SPAN CLASS="dt26 ll2"> </SPAN>
</DIV>
<DIV>
<SPAN CLASS="dt27 ll2"> </SPAN>
</DIV>
<DIV>
<SPAN CLASS="jl4">UTLINE OF </SPAN>M
<SPAN CLASS="jl4">ARK</SPAN> </SPAN>
</DIV>
<DIV>
<SPAN CLASS="dt29 ll2"> </SPAN>
</DIV>
<DIV>
<SPAN CLASS="dt30 ll2"> </SPAN>
</DIV>