Question

我正在使用Notepad ++中的正则表达式搜索功能来查找数百个文件中的匹配项。

我的目标是在每个父母/子女组合中找到一个。我不太关心具体选择什么（父母和孩子或只是孩子）。我只想知道父母中是否有一个特定的孩子。

我想找到一个也有一个子元素的父元素。

应查找内容的示例（因为子元素之一是）：

<description>
    <otherstuff>
    </otherstuff>
    <something>
    </something>
    <description>
    </description>
    <otherstuff>
    </otherstuff>
</description>

找不到的示例：

<description>
    <otherstuff>
    </otherstuff>
    <something>
    </something>
    <notadescription>
    </notadescription>
    <otherstuff>
    </otherstuff>
<description>

每个人可能还有其他孩子和子孩子。它们也可能在同一文档中。

如果我搜索此内容：

<description>(.*)<description>(.*)</description>

选择太多，因为当我只希望它选择第二个孩子时，它将选择另一个顶层。

Answer 1

您说过您正在使用Notepad ++，这是一种解决方法：

Ctrl + F
查找内容：<description>(?:(?!</description).)*<description>(?:(?!<description>).)*</description>
检查匹配大小写
检查环绕
检查正则表达式
检查. matches newline

说明：

<description>               # opening tag
(?:(?!</description).)*     # tempered greedy token, make sure we have not closing tag before:
<description>               # opening tag
(?:(?!<description>).)*     # tempered greedy token, make sure we have not opening tag before:
</description>              # closing tag

屏幕截图：

Answer 2

您不应该使用(.*)这太贪心了这是一个为什么您不应该使用它的示例

<description>
    <otherstuff>
    </otherstuff>
    <description>
        <description>hello<\description>
    </description>
<\description>

假设在这里我们使用<description>(.*)<description>(.*)</description> 它将解析：

    <description>
        <description>hello<\description>
    </description>
<\description>

因此，如果您只想解析第二个描述中的内容，则应使用(.*?)，这称为非贪婪使用<description>(.*)<description>(.*?)</description>将解析：

<description>
    <description>hello<\description> # end of parse
# here <\description> is missing cause (.*?) will look only for the first match

因此您必须使用(.*?)，它会在找到第一个末尾匹配项时立即停止解析，但是(.*)是贪婪的，因此它将寻找最大的匹配项

因此，如果您使用<description>(.*)<description>(.*?)</description>会很好，因为它只会解析您情况下子描述中的内容

Answer 3

我猜我们正在设计一个排除<notadescription>的表达式，例如：

<description>(?!<notadescription>)[\s\S]*<\/description>

如果要捕获描述元素，则可能需要一个捕获组：

(<description>(?!<notadescription>)[\s\S]*<\/description>)

正则表达式在XML中查找子元素

3 个答案:

Demo