如何从一行文本中动态捕获正则表达式中的两个日期?

时间:2017-10-31 17:45:25

标签: python regex python-2.7 regex-negation regex-lookarounds

我的文字会每周更改一次:

text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"

我正在寻找第1年和第2年的正则表达式。
(两者都会每周更换,因此我需要公式来捕获所有月,日,年)

我的输出应该如下:

2015 = November 5, 2015
2016 = November 3, 2016

我使用的框架不允许使用正则表达式捕获组或拆分,因此我需要将该公式专门用于此类型的字符串。

谢谢!

3 个答案:

答案 0 :(得分:2)

代码

根据我原来的评论

See regex in use here

(\w+\s+\d+,\s*(\d+))

注意:以上正则表达式与regex101上的正则表达式不匹配。这是故意完成的。 Regex101只能演示替换的输出,因此我已经将.*?添加到正则表达式中,以便正确显示预期的输出。

结果

输入

Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015

输出

2016 = November 3, 2016
2015 = November 5, 2015

用法

import re
regex = r"(\w+\s+\d+,\s*(\d+))"
str = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
for (date, year) in re.findall(regex, str):
    print year + ' = ' + date

答案 1 :(得分:1)

你可以试试这个:

text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
final_data = sorted(["{} = {}".format(re.findall("\d+$", i)[0], i) for i in re.findall("[a-zA-Z]+\s\d+,\s\d+", text)], key=lambda x:int(re.findall("^\d+", x)[0]))

输出:

['2015 = November 5, 2015', '2016 = November 3, 2016']

答案 2 :(得分:0)

使用@ctwheels regex:

text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"

import re
result = [(date.split(",")[1].strip(), date) for date in re.findall(r'\w+\s+\d+,\s*\d+', text)]
print(result)

# [('2016', 'November 3, 2016'), ('2015', 'November 5, 2015')]