在字符串开头标识日期

时间:2019-04-05 13:43:34

标签: python regex date

我的字符串开头有一个日期。我想从字符串中删除日期。

日期可能有多种格式,而且我以前也不知道(无论如何,我可以决定手动选择其中较为常见的日期,例如dd-mm-yyy,dd-毫米,日/毫米,...)。

我需要提取并存储日期之后的子字符串。

示例

例如,我有以下句子和desidera输出:

02/01/2019 英国首相->英国首相

02-01-2019 英国首相->英国首相

02/01/2019 中的英国总理-> 02/01/2019 中的英国总理

02-01-2019 18:52:02 英国首相->英国首相

我认为正则表达式可能是一个不错的选择,但实际上我无法解决正则表达式的问题。也欢迎其他方法!

2 个答案:

答案 0 :(得分:0)

您可以使用此正则表达式删除您提到的各种日期格式,

^(?:\d{2}[/-]){2}\d{4}(?:\s+(?:\d{2}:){2}\d{2}\b)? 

Demo 1

如果您想支持更多格式,例如年份第一的2019-10-22,则可以使用此增强型正则表达式,

^(?:\d{2,4}[/ -]){2}\d{2,4}(?:\s+(?:\d{2}:){2}\d{2}\b)? 

Demo 2

Python示例代码示例,

import re

arr = ['02/01/2019 The UK prime minister','02-01-2019 The UK prime minister','The UK prime minister in 02/01/2019','02-01-2019 18:52:02 The UK prime minister','2019-01-02 The UK prime minister','2019/01/02 The UK prime minister','2019 01 02 The UK prime minister','2019-01-02 18:52:02 The UK prime minister','2019/01/02 18:52:02 The UK prime minister','2019 01 02 The UK prime minister']

for s in arr:
 print(s, '-->', re.sub(r'^(?:\d{2,4}[/ -]){2}\d{2,4}(?:\s+(?:\d{2}:){2}\d{2}\b)? ?', '', s))

打印

02/01/2019 The UK prime minister --> The UK prime minister
02-01-2019 The UK prime minister --> The UK prime minister
The UK prime minister in 02/01/2019 --> The UK prime minister in 02/01/2019
02-01-2019 18:52:02 The UK prime minister --> The UK prime minister
2019-01-02 The UK prime minister --> The UK prime minister
2019/01/02 The UK prime minister --> The UK prime minister
2019 01 02 The UK prime minister --> The UK prime minister
2019-01-02 18:52:02 The UK prime minister --> The UK prime minister
2019/01/02 18:52:02 The UK prime minister --> The UK prime minister
2019 01 02 The UK prime minister --> The UK prime minister

答案 1 :(得分:0)

您不需要用空字符替换即可删除日期。我假设您的输入为列表。因此,您可以尝试以下操作:RegexDemo

import re

mylist = ["02/01/2019 The UK prime minister",
          "02-01-2019 The UK prime minister",
          "The UK prime minister in 02/01/2019",
          "02-01-2019 18:52:02 The UK prime minister"]

for d in mylist:
    regex = re.search("[0-9\/\-\:\s]+(\w.*)",d)
    regex = regex.groups()[0]
    print (regex)