如何删除像2019年7月1日这样的字符串

时间:2019-08-13 18:36:15

标签: python regex date datetime

我有一个大字符串,我想从中删除所有日期字符串子字符串。受约束,日期字符串都遵循以下格式:

年份的月份字符串(例如:2018年9月1日)

假设我的字符串是:

bad_s = "It was a fine day. September 1, 2018 and I had a lot of laughs August 2, 2017"

我想回来 good_s = "It was a fine day. and I had a lot of laughs"

在Python中有一种简便的方法吗?

这是我尝试过的:

reg_ex = """/[\'January\'\,\ \'February\'\,\ \'March\'\,\ \'April\'\,\ \'May\'\,\ \'June\'\,\ \'July\'\,\ \'August\'\,\ \'September\'\,\ \'October\'\,\ \'November\'\,\ \'December\'](?:\^\(\[1\-9\]\|\[12\]\\d\|3\[0\-q\]\)\$)/"""
replaced = re.sub(reg_ex, bad_s, "")

但是,这不能代替我想要的。我仍然以bad_s结尾。

编辑:如果它使任何人都更容易,这里是12个月的清单,因此您不必编写它们: months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

2 个答案:

答案 0 :(得分:3)

喜欢吗?

(january|february|march|april|may|june|july|august|september|octorber|november|december) ([1-9]|[1-2]\d|3[01]), \d{4}

请不要忘记/i标志或任何与Python等效的标志。

请注意,这并不关心一个月中有多少天,因此February 31, 2017将会匹配,它也不关心leap年。此正则表达式是匹配器而不是验证器。

如果您希望它更通用并且完全忽略日期验证,那么它将起作用:

(january|february|march|april|may|june|july|august|september|octorber|november|december) \d+, \d+

https://regex101.com/r/zVbb0v/5

答案 1 :(得分:0)

也许您可以尝试以下方法:

重新导入

bad_s = 'It was a fine day. September 20, 2018 and I had a lot of laughs August 2, 2017'
regex = '([^\s]+ ([1-9]|[12]\d|3[01])\, ([12]\d{3}))'

for x in re.findall(regex, bad_s):
    bad_s = bad_s.replace(x[0], '')

print(bad_s)

结果:

"It was a fine day.  and I had a lot of laughs"