我有以下文字:
Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety
我需要将日期+幻灯片替换为.
(点)才能得到以下结果:
Filling a gap. Small parts example. Padded details. Adds to safety
可能该遮罩可用于标识要删除的文本:
{month} {day}, {year} {Slide} {slide number}
我可以使用正则表达式删除月份,如下所示:
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
但是如何定义蒙版并将所有内容放在一起? 不确定正则表达式是否是正确的解决方案,或者它是矫kill过正。
答案 0 :(得分:3)
匹配1到31之间的日子以使其更加具体,然后滑动后跟1个或多个数字。
如果前后匹配空格,并用点和单个空格代替,则将省略双倍空格。
替换为.
\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*
import re
pattern=r"\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*"
s="Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety"
print(re.sub(pattern, ". ", s))
输出
Filling a gap. Small parts example. Padded details. Adds to safety
答案 1 :(得分:1)
尝试一下
(?:\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}\D?)?\D?(?:(?:19[7-9]\d|20\d{2})|\d{2}) Slide \d+