Question

我必须在此正则表达式中更改哪些内容，以便在下面的两种情况下都将 first 冒号之前的文本作为“标签”和所有其余文本< / em>作为“文本”。

using System; using System.Text.RegularExpressions; namespace TestRegex92343 { class Program { static void Main(string[] args) { { //THIS WORKS: string line = "title: The Way We Were"; Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)"); Match match = regex.Match(line); Console.WriteLine("LABEL IS: {0}", match.Groups["label"]); //"title" Console.WriteLine("TEXT IS: {0}", match.Groups["text"]); //"The Way We Were" } { //THIS DOES NOT WORK: string line = "title: The Way We Were: A Study of Youth"; Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)"); Match match = regex.Match(line); Console.WriteLine("LABEL IS: {0}", match.Groups["label"]); //GETS "title: The Way We Were" //SHOULD GET: "title" Console.WriteLine("TEXT IS: {0}", match.Groups["text"]); //GETS: "A Study of Youth" //SHOULD GET: "The Way We Were: A Study of Youth" } Console.ReadLine(); } } }

Answer 1

new Regex(@"(?<label>[^:]+):\s*(?<text>.+)");

这只是用[^:]字符类替换点。这意味着除冒号之外的任何字符。

Answer 2

正则表达式是贪婪的，.匹配任何。这就是为什么标签获得整个字符串。如果你的标题总是只是文字，我建议如下：

(?<label>\w+):\s*(?<text>.+)

否则，您可以使表达式不贪婪：

(?<label>.+?):\s*(?<text>.+)

您希望尽可能避免使用贪婪的操作符，并始终尝试专门匹配您想要的内容。

如何更改此正则表达式，以便在FIRST冒号之前抓取文本并忽略其余的？

2 个答案: