匹配范围内可能的日期元素

时间:2018-09-04 00:03:49

标签: regex

我很难匹配日期范围内的其他情况。最终目标是提取每个组以建立ISO 8601日期格式。

测试用例

May 8th – 14th, 2019
November 25th – December 2nd
November 5th, 2018 – January 13th, 2019
September 17th – 23rd

正则表达式

(\w{3,9})\s([1-9]|[12]\d|3[01])(?:st|nd|rd|th),\s(19|20)\d{2}\s–\s(\w{3,9})\s([1-9]|[12]\d|3[01])(?:st|nd|rd|th),\s(19|20)\d{2}

regexr

我希望能够捕获每个组,无论它是否存在。

例如May 8th – 14th, 2019

Group 1 May
Group 2 8th
Group 3 
Group 4 
Group 5 14th
Group 6 2019

还有November 5th, 2018 – January 13th, 2019

Group 1 November
Group 2 5th
Group 3 2018
Group 4 January
Group 5 13th
Group 6 2019

2 个答案:

答案 0 :(得分:1)

要在组不匹配时捕获空字符串,通常的想法是使用(<characters to match>|)

尝试这个:

([A-z]{3,9})\s((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, (?=19|20))?(\d{4}|)\s–\s([A-z]{3,9}|)\s?((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, (?=19|20))?(\d{4}|)

https://regex101.com/r/4UY0WE/1/

在尝试捕获月份(第一组)时,请确保使用[A-z]{3,9}而不是\w{3,9},否则您可能会匹配,例如23rd而不是月份字符串。

分离出来:

([A-z]{3,9})      # Month ("January")
\s
((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))  # Day of month, including suffix ("23rd")
(?:, (?=19|20))?  # Comma and space, if followed by year
(\d{4}|)          # Year
\s–\s             #
([A-z]{3,9}|)     # same as first line
\s?

# same as third to fifth lines:
((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th)) 
(?:, (?=19|20))?
(\d{4}|)

答案 1 :(得分:1)

通过合并某些分组,可以节省一些空间。

Try it here

完整正则表达式:

([A-z]{3,9}) ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, ((?:19|20)\d{2}))? [–-] ([A-z]{3,9}\s)?((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, ((?:19|20)\d{2}))?

按组分隔(为了便于阅读,用\s替换了空格)

1. ([A-z]{3,9})
   \s
2. ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))
3. (?:,\s((?:19|20)\d{2}))?
   \s[–-]\s
4. ([A-z]{3,9}\s)?
5. ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))
6. (?:,\s((?:19|20)\d{2}))?

此方法不使用查找,因此对于任何正则表达式引擎来说通常都是安全的。