Question

我试图使用正则表达式在一些html文件中搜索文本内容。

我创建了一个可以正常工作的正则表达式((?<=>)[^<>]+?(?=([\s\r]*<)))，但是搜索结果中还包括仅空格值和前导空格。

<h1>test</h1>
<table class="table table-striped table-bordered custom-table" width="100%" align="center" frame="box" bgcolor="white"
    id="dtGrid" style="background:#fff !important;">
    <thead>
        <tr>
            <th>
                Type
            </th>
        </tr>
    </thead>
</table>

搜索结果将包含test，Type和其他一些仅带有空格的结果。如何从搜索结果中删除仅空白值和前导空白？

Answer 1

尝试一下：

(?<=>)\s*+([^<>]+?(?=(?:[\s\r]*<)))

您有一个演示here。

结果将在第一个捕获组

这是我从您原来的正则表达式更改的内容。

 (?<=>)\s*+([^<>]+?(?=(?:[\s\r]*<)))
^      ^   ^          ^
|      |   |           \__ Used a non-caturing group (just recommended)
|      |   |
|      |   \___ This is now the begin of the capturing group
|      |   
|      \___ Added whitespaces with possesive quantifier (cannot backtrack)
|
\__ Removed beginning of first capturing group

最后，如果您不想使用捕获组，而只使用匹配项，则可以使用此正则表达式：

(?<=>)\s*+\K[^<>]+?(?=(?:[\s\r]*<))

它类似于上一个。区别在于使用\K来忘记当前匹配的字符串。

您有一个新的演示here。

正则表达式从选择中排除空格

1 个答案: