Question

我正在尝试解析XML文档（特别是Sublime颜色主题），我正在尝试使用负向前瞻来防止我不想要的匹配，但它似乎无法正常工作。

模式如下：

/
<key>name<\/key>
.*?                     # find as little as possible including new lines
<string>(.*?)<\/string> # Match the name of this color Rule
.*?
<dict>
((?!<\/dict>).)*?       # After the second opening <dict>, do not allow a closing </dict>
<key>foreground<\/key>  
.*?
<string>(.*?)<\/string> # Match the hex code for the name found in Match 1.
/mx                     # Treat a newline as a character matched by .
                        # Ignore Whitespace, comments.

匹配的字符串是：

<dict>
        <key>name</key>
        <string>**Variable**</string>
        <key>scope</key>
        <string>variable</string>
        <key>settings</key>
        <dict>
            <key>fontStyle</key>
            <string></string>
        </dict>
    </dict>

    <dict>
        <key>name</key>
        <string>Keyword</string>
        <key>scope</key>
        <string>keyword - (source.c keyword.operator | source.c++ keyword.operator | source.objc keyword.operator | source.objc++ keyword.operator), keyword.operator.word</string>
        <key>settings</key>
        <dict>
            <key>foreground</key>
            <string>**#F92672**</string>

匹配整个字符串，第一个捕获组为**Variable**，第二个为**#F92672**。理想情况下，我希望在第二部分中第一个捕获的组为Keyword。我认为负向前瞻的存在意味着第一部分不会成为匹配的一部分，因为它会看到</dict>并且无法匹配。

有谁知道我做错了什么以及我能做些什么来修复它？谢谢！

Answer 1

以下是与Nokogiri合作的方法：

require 'nokogiri'

theme = Nokogiri::XML.fragment(xml)
puts theme.xpath('./dict[1]/key[text()="name"]/following-sibling::string[1]').text
#=> "**Variable**"
puts theme.xpath('.//dict[preceding-sibling::key[1][text()="settings"]]/string').text
#=> "**#F92672**"

第一个xpath接受第一个dict并找到包含＆＃34; name＆＃34;的key，然后获取以下string元素的文本。

第二个XPath在dict包含＆＃34;设置＆＃34;后立即查找key，并检索其string元素的文本。

请注意，如果您要解析完整文档而不是给定片段，则需要进行一些更改，例如将调用更改为theme = Nokogiri::XML.parse(xml)并删除前导{{ 1）来自XPath表达式。

Answer 2

第一个带有dict的{{1}}和带有**Variable**的第二个Keyword具有相同的结构。并且你希望通过消极的前瞻来区分它们，但这是不可能的。

将((?!<\/dict>).)*?更改为(((?!<\/dict>).)*?)进行调试你可以看到新的组内容是

result="
        <key>name</key>
        <string>Keyword</string>
        <key>scope</key>
        <string>keyword - (source.c keyword.operator | source.c++ keyword.operator | source.objc keyword.operator | source.objc++ keyword.operator), keyword.operator.word</string>
        <key>settings</key>
        <dict>
            "

这满足了你的负面预测。

即使您添加了更多条件（仅将结构用作条件而不是内容），因为相同的结构，**Variable**将始终位于**#F92672**之前。

因此使用xml解析器可能是更好的选择。

Ruby中的前瞻性前瞻和前后匹配

2 个答案: