正则表达式提取日期和时间

时间:2016-09-20 10:51:48

标签: python regex

我正在使用nltk正则表达式进行日期和时间提取:

text = 'LEts have quick meeting on Wednesday at 9am'
week_day = "(monday|tuesday|wednesday|thursday|friday|saturday|sunday)"
month = "(january|february|march|april|may|june|july|august|september| \
          october|november|december)"
dmy = "(year|day|week|month)"
exp2 = "(this|next|last)"
regxp2 = "(" + exp2 + " (" + dmy + "|" + week_day + "|" + month + "))"
reg2 = re.compile(regxp2, re.IGNORECASE)
found = reg2.findall(text)
found = [a[0] for a in found if len(a) > 1]
for timex in found:
    timex_found.append(timex)

print timex_found

一切看起来都对我而言,但它没有标记Wednesday任何线索?我应该做些什么改变来考虑“周三”以及“本周三”

威尔

regxp2 = "((this|next|last)? (" + dmy + "| " + week_day + "| " + month+ "))"

考虑我的情况?

1 个答案:

答案 0 :(得分:3)

正则表达式正在寻找((this|next|last) (dmy|weekday|month))

您的输入没有匹配。

可能有效的替代方案:

((this|next|last|on) (dmy|weekday|month))

((this|next|last)? (dmy|weekday|month))