Question

我正在尝试从HTML字符串中获取文本字符串。我想只捕获标签之间的文本并跳过任何空标签。

我的尝试是当前的尝试可以在这里找到：
https://regex101.com/r/3Ujmw6/2

我试过了：

/>(\X+?)</g

//I will fail on nested tags, it capture the first nested tag
<p><strong>blablab</strong></p>

而且：

/>(\X*?)</g

//Finds me all the string, but also includes loads of empty strings
//for adjacent tags ><

有没有办法排除＆lt;来自\ X？或者有更好的方法来写这个，所以它只返回文本部分吗？

Answer 1

试试像

这样的正则表达式

>(\s*[^\s<][^<]*)

这只是匹配>和<之间不是所有空格的所有文本。请参阅https://regex101.com/r/3Ujmw6/4。