我需要处理一些遗留数据并解析一些模糊的正式文本字段。我没有尝试使用正则表达式匹配,而是考虑构建一些简单的语法定义,并使用一些工具来基于此解析字符串。
要解析的其中一列的一些示例数据
08-JUL-13 To 09-AUG-13 BREAKFAST 0900 LUNCH 1230
或
08-JUL-13 To 22-AUG-13 LUNCH 1230
或
08 JUL 13 To 16 AUG 13 EAST WARD LUN 0200
所以我的语法就像这样,正确的正则表达式模式是什么?
DateRange:[DateWithOrWithoutDashes TO DateWithOrWithoutDashes] {BlaBla}0..* {Break* Time}0..1 {Lun Time}0..1
答案 0 :(得分:1)
此模式将与您的所有示例相匹配。
([0-9])+(-| )([A-Z])+(-| )([0-9])+(-| )+(To)(-| )+([0-9])+(-| )([A-Z\])+(-| )([0-9])+(( )+([A-Z])+( )+([0-9])+)+
答案 1 :(得分:1)
您可以尝试以下正则表达式:
^(?<start_date>\d{2}-?[A-Z]{3}-?\d{2})\s+To\s+(?<end_date>\d{2}-?[A-Z]{3}-?\d{2})\s+(?:(?<type>[A-Z\s]+?)\s+(?<time>\d{4})\s*)+
C#代码示例:
string[] lines = {
"08-JUL-13 To 09-AUG-13 BREAKFAST 0900 LUNCH 1230",
"08-JUL-13 To 22-AUG-13 LUNCH 1230",
"08 JUL 13 To 16 AUG 13 EAST WARD LUN 0200"
};
foreach (string line in lines)
{
Match m = Regex.Match(line, @"^(?<start_date>\d{2}[-\s][A-Z]{3}[-\s]\d{2})\s+To\s+(?<end_date>\d{2}[-\s][A-Z]{3}[-\s]\d{2})\s+(?:(?<type>[A-Z\s]+?)\s+(?<time>\d{4})\s*)+");
if (m.Success)
{
Console.WriteLine("Start date: {0}", m.Groups["start_date"].Value);
Console.WriteLine("End date: {0}", m.Groups["end_date"].Value);
for (int i = 0; i < m.Groups["type"].Captures.Count; i++)
{
Console.WriteLine("Event type[{0}]: {1}", i, m.Groups["type"].Captures[i].Value);
Console.WriteLine("Event time[{0}]: {1}", i, m.Groups["time"].Captures[i].Value);
}
Console.WriteLine();
}
}
输出:
Start date: 08-JUL-13
End date: 09-AUG-13
Event type[0]: BREAKFAST
Event time[0]: 0900
Event type[1]: LUNCH
Event time[1]: 1230
Start date: 08-JUL-13
End date: 22-AUG-13
Event type[0]: LUNCH
Event time[0]: 1230
Start date: 08 JUL 13
End date: 16 AUG 13
Event type[0]: EAST WARD LUN
Event time[0]: 0200