具有“紧密”重复键/值对模式的字符串(对于此示例,键是“name”,值应该是单个小写字词)
string text = "name: abc name: def name: ghi name: jkl";
应该转换为输出
abc,def,ghi,jkl,
而在
中的模式中有任何干扰(“非紧”)string text = "name: abc x name: def name: ghi name: jkl";
会导致匹配失败,这与
相似abc,##发生异常:x无法与模式##
匹配
我试过了
string text = "name: abc name: def name: ghi name: jkl";
string pattern = @"name:\s*([a-z])*\s*";
MatchCollection ms = Regex.Matches(text, pattern);
foreach (Match m in ms)
{
Console.Write(m.Groups[1].Value+", ");
}
但它返回
c,f,i,l,
导致这种奇怪行为的原因是什么?如何解决?
答案 0 :(得分:4)
您只需要在括号内移动*
以捕获完整的字符串。如果要防止无效输入,则不一定需要正则表达式。这假设您的值不能有空格,因为这将是一个难以解决的问题。
void Main()
{
string validText = "name: abc name: def name: ghi name: jkl";
string invalidText = "name: abc x name: def name: ghi name: jkl";
string validPattern = @"name:\s*([a-z]*)\s*";
if (!Validate(invalidText))
{
try
{
throw new Exception("invalid input");
}
catch (Exception exception)
{
Console.WriteLine($"Input '{invalidText}' produces: {exception.Message}");
}
}
MatchCollection ms = Regex.Matches(validText, validPattern);
Console.Write($"Input '{validText}' produces: ");
foreach (Match m in ms)
{
Console.Write(m.Groups[1].Value + ", ");
}
}
public static bool Validate(string input)
{
var pairs = input.Split(' ');
return !pairs.Where((item, index) => index % 2 != 0).Any(item => item.EndsWith(":"));
}
// Input 'name: abc x name: def name: ghi name: jkl' produces: invalid input
// Input 'name: abc name: def name: ghi name: jkl' produces: abc, def, ghi, jkl,
答案 1 :(得分:1)
你不能只使用
var result = "name: abc name: def name: ghi name: jkl".Split(new [] { "name: " }, StringSplitOptions.None).Where(a=>!String.IsNullOrEmpty(a)).ToArray();
答案 2 :(得分:1)
与大多数其他正则表达式不同,C#(。Net)的引擎实际上通过Group
类的Captures
属性跟踪重复捕获。
Group.Captures Property
获取捕获组匹配的所有捕获的集合,以最左上第一顺序(如果正则表达式使用RegexOptions.RightToLeft选项修改,则为最内 - 最右 - 第一顺序)。
这意味着通过访问Groups[1]
(如下面的代码所示)然后访问Captures
属性,我们有效地获取每个重复捕获的值
using System;
using System.Linq;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = new string[]{
"name: abc name: def name: ghi name: jkl",
"name: abc x name: def name: ghi name: jkl"
};
Regex regex = new Regex(@"^(?:name: *([a-z]+) *)+$");
foreach(string s in strings) {
if(regex.IsMatch(s)) {
Match match = regex.Match(s);
Console.WriteLine(string.Join(", ", from Capture c in match.Groups[1].Captures select c.Value));
} else {
Console.WriteLine("Invalid input");
}
}
}
}
name: abc name: def name: ghi name: jkl # abc, def, ghi, jkl
name: abc x name: def name: ghi name: jkl # Invalid input
答案 3 :(得分:0)
我终于找到了获得“错误位置”的解决方案,即。重复匹配首先失败的位置。我希望任何遇到同样问题的人都会发现这个错误的答案:
string s = "name: abc x name: def name: ghi name: jkl";
string pattern = @"\Gname:\s*([a-z]+)\s*";
int endOfLastMatch = 0;
// by using the \G anchor we can manage pattern repetition in a for-loop
for (Match m = Regex.Match(s, pattern); m.Success; m = m.NextMatch())
{
Console.WriteLine(m.Groups[1].Value+ ", ");
// keep track of until where matches were successful
endOfLastMatch = m.Index + m.Length;
}
// in case that the match has failed, report where it has happened
if (endOfLastMatch != s.Length)
{
int reportSize = Math.Min(s.Length-endOfLastMatch, 10);
string remainder = s.Substring(endOfLastMatch, reportSize);
Console.WriteLine("Error: RegEx match failed at index "
+endOfLastMatch+" (\""+remainder+"...\")");
}
输出:
ABC
错误:RegEx匹配在索引10处失败(“x name:de ...”)
神奇发生在\G
锚点,它允许我们使用NextMatch
进行模式重复,同时强制下一个匹配紧跟最后一个匹配。匹配边界相当于^
(行或字符串开始)锚点,所以说。
价格是我们不会在一个块中解析文本(如在接受的答案中),而是在for循环中。但由于它相对较小,我认为这是可以接受的。