使用.NET正则表达式在引号之间解析文本

时间:2015-05-04 00:30:41

标签: c# .net regex lookahead lookbehind

我有以下输入文字:

@"This is some text @foo=bar @name=""John \""The Anonymous One\"" Doe"" @age=38"

我想用@ name = value语法解析值作为名称/值对。解析前一个字符串应该会产生以下命名的捕获:

name:"foo"
value:"bar"

name:"name"
value:"John \""The Anonymous One\"" Doe"

name:"age"
value:"38"

我尝试了以下正则表达式,它让我几乎那里:

@"(?:(?<=\s)|^)@(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"

主要问题是它捕获"John \""The Anonymous One\"" Doe"中的开头报价。我觉得这应该是一个后视而不是前瞻,但这似乎根本不起作用。

以下是表达式的一些规则:

  • 名称必须以字母开头,并且可以包含任何字母,数字,下划线或连字符。

  • 不带引号必须至少包含一个字符,并且可以包含任何字母,数字,下划线或连字符。

  • 引用值可以包含任何字符,包括任何空格和转义引号。

编辑:

这是regex101.com的结果:

(?:(?<=\s)|^)@(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))

(?:(?<=\s)|^) Non-capturing group
@ matches the character @ literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
    Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
    1st Alternative: [A-Za-z0-9_-]+
        [A-Za-z0-9_-]+ match a single character present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            A-Z a single character in the range between A and Z (case sensitive)
            a-z a single character in the range between a and z (case sensitive)
            0-9 a single character in the range between 0 and 9
            _- a single character in the list _- literally
    2nd Alternative: (?=").+?(?=(?<!\\)")
        (?=") Positive Lookahead - Assert that the regex below can be matched
            " matches the characters " literally
        .+? matches any character (except newline)
            Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
        (?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
            (?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
                \\ matches the character \ literally
            " matches the characters " literally

2 个答案:

答案 0 :(得分:1)

您可以使用非常有用的.NET正则表达式功能,其中允许多个同名的捕获。此外,您的(?<name>)捕获组存在问题:它允许第一个位置的数字,这不符合您的第一个要求。

所以,我建议:

(?si)(?:(?<=\s)|^)@(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))

请参阅demo

请注意,您无法在regex101.com上调试特定于.NET的正则表达式,您需要在符合.NET的环境中对它们进行测试。

答案 1 :(得分:0)

使用字符串方法。

<强>分割

string myLongString = ""@"This is some text @foo=bar @name=""John \""The Anonymous One\"" Doe"" @age=38"

string[] nameValues = myLongString.Split('@');

从那里使用分割功能&#34; =&#34;或使用 IndexOf(&#34; =&#34;)