为什么我的正则表达式不能正确返回group(0)?

时间:2018-10-11 17:49:47

标签: python regex

我想从大量文件中查找日期。日期在一行上,格式为"21 September 2010"。每个文件中只有一个这样的日期。

以下代码仅返回月份,例如"September"。为什么group(0)不能给我像"21 September 2010"这样的东西? 这里缺少什么?谢谢!

months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

pattern = r"^\d{2} +" + "|".join(months) + r" +\d{4}$"
match = re.search(pattern, text)
if match:
    fdate = match.group(0)

1 个答案:

答案 0 :(得分:2)

打印正则表达式时,您会看到它看起来像^\d{2} +January|February|March|April|May|June|July|August|September|October|November|December +\d{4}$。将其应用于21 September 2010时,您将see that it matches September,因为^\d{2} +仅可在字符串的开头与January匹配,因为未对月份替代项进行分组。 / p>

您需要分组月份替代项:

pattern = r"^\d{{2}} +(?:{}) +\d{{4}}$".format("|".join(months))

请参见Python demo

import re
text = "21 September 2010"
months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
pattern = r"^\d{{2}} +(?:{}) +\d{{4}}$".format("|".join(months))
match = re.search(pattern, text)
if match:
    fdate = match.group(0)
    print(fdate) # => 21 September 2010