我的字符串开头有一个日期。我想从字符串中删除日期。
日期可能有多种格式,而且我以前也不知道(无论如何,我可以决定手动选择其中较为常见的日期,例如dd-mm-yyy,dd-毫米,日/毫米,...)。
我需要提取并存储日期之后的子字符串。
示例:
例如,我有以下句子和desidera输出:
02/01/2019 英国首相->英国首相
02-01-2019 英国首相->英国首相
02/01/2019 中的英国总理-> 02/01/2019 中的英国总理
02-01-2019 18:52:02 英国首相->英国首相
我认为正则表达式可能是一个不错的选择,但实际上我无法解决正则表达式的问题。也欢迎其他方法!
答案 0 :(得分:0)
您可以使用此正则表达式删除您提到的各种日期格式,
^(?:\d{2}[/-]){2}\d{4}(?:\s+(?:\d{2}:){2}\d{2}\b)?
如果您想支持更多格式,例如年份第一的2019-10-22
,则可以使用此增强型正则表达式,
^(?:\d{2,4}[/ -]){2}\d{2,4}(?:\s+(?:\d{2}:){2}\d{2}\b)?
Python示例代码示例,
import re
arr = ['02/01/2019 The UK prime minister','02-01-2019 The UK prime minister','The UK prime minister in 02/01/2019','02-01-2019 18:52:02 The UK prime minister','2019-01-02 The UK prime minister','2019/01/02 The UK prime minister','2019 01 02 The UK prime minister','2019-01-02 18:52:02 The UK prime minister','2019/01/02 18:52:02 The UK prime minister','2019 01 02 The UK prime minister']
for s in arr:
print(s, '-->', re.sub(r'^(?:\d{2,4}[/ -]){2}\d{2,4}(?:\s+(?:\d{2}:){2}\d{2}\b)? ?', '', s))
打印
02/01/2019 The UK prime minister --> The UK prime minister
02-01-2019 The UK prime minister --> The UK prime minister
The UK prime minister in 02/01/2019 --> The UK prime minister in 02/01/2019
02-01-2019 18:52:02 The UK prime minister --> The UK prime minister
2019-01-02 The UK prime minister --> The UK prime minister
2019/01/02 The UK prime minister --> The UK prime minister
2019 01 02 The UK prime minister --> The UK prime minister
2019-01-02 18:52:02 The UK prime minister --> The UK prime minister
2019/01/02 18:52:02 The UK prime minister --> The UK prime minister
2019 01 02 The UK prime minister --> The UK prime minister
答案 1 :(得分:0)
您不需要用空字符替换即可删除日期。我假设您的输入为列表。因此,您可以尝试以下操作:RegexDemo
import re
mylist = ["02/01/2019 The UK prime minister",
"02-01-2019 The UK prime minister",
"The UK prime minister in 02/01/2019",
"02-01-2019 18:52:02 The UK prime minister"]
for d in mylist:
regex = re.search("[0-9\/\-\:\s]+(\w.*)",d)
regex = regex.groups()[0]
print (regex)