正则表达式以匹配日期(月日,年或月/日/年)

时间:2019-05-09 18:40:11

标签: c# regex regex-lookarounds regex-group regex-greedy

我正在尝试编写一个正则表达式,该表达式可用于查找字符串中的日期,该字符串的前面(或后面)可以有空格,数字,文本,行尾等。该表达式应处理美国日期格式是

1)年的月份名称天,即2019年1月10日,或
2)mm / dd / yy-即11/30/19

我找到这个是因为月份名称(Day Year)

(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}

(感谢Veverke在这里Regex to match date like month name day comma and year

这是mm / dd / yy(以及m / d / y的各种组合)

(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2} 

(感谢Steven Levithan和Jan Goyvaerts在这里https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html

我试图这样合并它们

((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})

,当我在输入字符串“ Pad on 1/1/2019”中搜索“ on [regex above]”时,它确实找到了日期,但没有找到单词“ on”。如果我只是使用

,则会找到该字符串
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}

任何人都可以看到我在做什么吗?

修改

我正在使用下面的c#.net代码:

    string stringToSearch = "Paid on 1/1/2019";
    string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
    var match = Regex.Match(stringToSearch, searchPattern, RegexOptions.IgnoreCase);


    string foundString;
    if (match.Success)
        foundString= stringToSearch.Substring(match.Index, match.Length);

例如

string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
stringToSearch = "Paid on Jan 1, 2019";
found = "on Jan 1, 2019" -- worked as expected, found the word "on" and the date

stringToSearch = "Paid on 1/1/2019";
found = "1/1/2019"  -- did not work as expected, found the date but did not include the word "on"

如果我反转图案

string searchPattern = @"on ((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})"";

stringToSearch = "Paid on Jan 1, 2019";
found = "Jan 1, 2019" -- did not work as expected, found the date but did not include the word "on"

stringToSearch = "Paid on 1/1/2019";
found = "on 1/1/2019" -- worked as expected, found the word "on" and the date

谢谢

1 个答案:

答案 0 :(得分:0)

您的表情似乎都很好,两者都可以。如果您想捕获目标输出之前或之后的任何内容,只需在左右添加两个边界,即可为您完成。例如,请查看this test

(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)

例如,您可以在其中添加类似于(.*)的两个组,然后将原始表达式包装在一组中。

enter image description here

RegEx描述图

该图直观地显示了您的表达式的工作方式,并且您可能希望在此link中测试其他表达式:

enter image description here

C#测试

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)";
        string input = @"Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

JavaScript演示

此JavaScript演示显示您的表达式有效:

const regex = /(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)/gm;
const str = `Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after`;
const subst = `\nGroup 1: $1 \nGroup 2: $2 \nGroup 3: $3 \nGroup 4: $4 `;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

基本性能测试

此JavaScript代码段返回100万次for循环以提高性能。

const repeat = 1000000;
const start = Date.now();

for (var i = repeat; i >= 0; i--) {
	const string = 'Paid on Jan 1, 2019';
	const regex = /(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)/gm;
	var match = string.replace(regex, "\nGroup #1: $1\nGroup #2: $2 \n");
}

const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match  ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test.  ");

改进

您可能希望在月份名称周围减少捕获组,并且可以根据需要将所有捕获组添加到一个捕获组中。