Question

请考虑以下输入字符串：

X = Y
  Z =û
  Q = P


Lorem Ipsum只是印刷和排版行业的虚拟文本。
  自16世纪以来，Lorem Ipsum一直是业界标准的虚拟文本

我想知道是否有可能使用正则表达式一行来捕获以下内容：

左：X
  正确：Y
  左：Z
  正确：U
  左：Q
  正确：P

  文字：Lorem Ipsum只是打印和排版的虚拟文本   行业。 Lorem Ipsum一直是业界标准的虚拟文本   自16世纪以来一直

这个想法是，有一堆行具有特定的格式，后跟“\ r \ n”，之后是一些文本。我想分别捕获每个键值对（在本例中）和文本。

捕获结构化数据非常简单（这里只是一个例子）：

(?:^(?<left>\S+)=(?<right>\S)\n)

但我无法弄明白如何指定：

“继续捕获此模式，直到第一个空行，然后取出所有内容并将其捕获到”text“。

使用代码解决这个问题很容易，但我真的很有兴趣学习它是否只有一个正则表达式的衬里才能实现。

Answer 1

是的，在.NET中（并且只在那里）你可以重复捕获组，并从每次重复中获取捕获：

^               # anchor pattern to the beginning of the string
(?:             # non-capturing group for a single x=y line
  (?<left>\S+)  # match and capture left-hand side
  =
  (?<right>\S+) # match and capture right-hand side
  \n
)+              # repeat
\n              
(?<text>.*)     # match the remainder of the string
$               # anchor pattern to the end of the string (not really necessary)

请务必使用RegexOptions.IgnorePatternWhitespace和RegexOptions.Singleline。

如果您的Match对象被调用m，那么您现在可以检索：

m.Groups["left"].Captures  // for a list of all left-hand sides
m.Groups["right"].Captures // for a list of all right-hand sides
m.Groups["text"].Value     // for the remainder of the string

在捕获组上应用量词

1 个答案: