仅4个正则表达式模式/组匹配日期和斜杠

时间:2019-06-23 17:36:28

标签: c# .net regex

我有一个用c Sharp编写的程序,用于从包含检查结果的CSV文件中提取图案。匹配包含4位数字的中心号码的正则表达式之一是匹配其他带有斜杠的字符串,即日期时间字符串。 4个数字的正则表达式提取一个名为centerNumber的命名组:
(?<centreNumber>[0-9]{4})
记录模式后的匹配项包括:

matched centre number -> 6319
matched centre number -> 4/22/2017 6:28:17 PM
matched centre number -> 2016 MALAWI SCHOOL CERTIFICATE OF EDUCATION EXAMINATIONS


输入样本,按CSV逐行显示:

CENTRE NO: LIKOMA SECONDARY
CAND.ID
0035
4/22/2017 6:28:17 PM
CENTRE NO: LIKOMA SECONDARY
CAND.ID
5035
4/22/2017 6:28:17 PM
CENTRE NO: CHIFUNGA COMMUNITY
CAND.ID
0224
4/22/2017 6:28:46 PM
CENTRE NO: CHIKONDE COMMUNITY
CAND.ID
0238
4/22/2017 6:28:46 PM


上述示例输入的预期输出:

0035
5035
0224
0238


要访问命名组,我已将Regex加载到一个常量中:

 StreamReader sr = new StreamReader(filepath);
        while (!sr.EndOfStream)
        {
            var oneLine = sr.ReadLine();//read single line from csv    
            public const String REGEX_MSCE_CENTRE_NO = @"(?<centreNumber>[0-9]{4})";
            Regex cNoRegex = new Regex(classes.AppConstants.REGEX_MSCE_CENTRE_NO, RegexOptions.Compiled | RegexOptions.IgnoreCase);

            MatchCollection matches = cNoRegex.Matches(oneLine);
             if (matches.Count == 1)
                {
                    Console.WriteLine("matched centre number -> " + oneLine);                
                }
}

1 个答案:

答案 0 :(得分:3)

正如FLydog57的评论中所述,这里我们只想拥有开始和结束锚点,这可能会解决我们的问题:

^[0-9]{4}$
^\d{4}$

Demo

测试

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"^[0-9]{4}$";
        string input = @"6319
4/22/2017 6:28:17 PM
2016 MALAWI SCHOOL CERTIFICATE OF EDUCATION EXAMINATIONS
2016";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

Please see the c# demo here.