我有一个大字符串,我想从中删除所有日期字符串子字符串。受约束,日期字符串都遵循以下格式:
年份的月份字符串(例如:2018年9月1日)
假设我的字符串是:
bad_s = "It was a fine day. September 1, 2018 and I had a lot of laughs August 2, 2017"
我想回来
good_s = "It was a fine day. and I had a lot of laughs"
在Python中有一种简便的方法吗?
这是我尝试过的:
reg_ex = """/[\'January\'\,\ \'February\'\,\ \'March\'\,\ \'April\'\,\ \'May\'\,\ \'June\'\,\ \'July\'\,\ \'August\'\,\ \'September\'\,\ \'October\'\,\ \'November\'\,\ \'December\'](?:\^\(\[1\-9\]\|\[12\]\\d\|3\[0\-q\]\)\$)/"""
replaced = re.sub(reg_ex, bad_s, "")
但是,这不能代替我想要的。我仍然以bad_s
结尾。
编辑:如果它使任何人都更容易,这里是12个月的清单,因此您不必编写它们:
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
答案 0 :(得分:3)
喜欢吗?
(january|february|march|april|may|june|july|august|september|octorber|november|december) ([1-9]|[1-2]\d|3[01]), \d{4}
请不要忘记/i
标志或任何与Python等效的标志。
请注意,这并不关心一个月中有多少天,因此February 31, 2017
将会匹配,它也不关心leap年。此正则表达式是匹配器而不是验证器。
如果您希望它更通用并且完全忽略日期验证,那么它将起作用:
(january|february|march|april|may|june|july|august|september|octorber|november|december) \d+, \d+
答案 1 :(得分:0)
也许您可以尝试以下方法:
重新导入
bad_s = 'It was a fine day. September 20, 2018 and I had a lot of laughs August 2, 2017'
regex = '([^\s]+ ([1-9]|[12]\d|3[01])\, ([12]\d{3}))'
for x in re.findall(regex, bad_s):
bad_s = bad_s.replace(x[0], '')
print(bad_s)
结果:
"It was a fine day. and I had a lot of laughs"