您好我希望得到以下内容:
test = re.compile(r' [0-12](am|pm) [1-1000] days from (yesterday|today|tomorrow)')
这场比赛:
print test.match(" 3pm 2 days from today")
它没有返回,我做错了什么?我正在进入正则表达式并阅读我认为应该工作的文档!任何帮助都表示赞赏 圣油
我在NLP HERE
中使用与上述相似的过程提出一个关于系统设计的新问题答案 0 :(得分:5)
这是我戴着戒指的帽子。仔细研究这个正则表达式会教几个教训:
import re
reobj = re.compile(
r"""# Loosely match a date/time reference
^ # Anchor to start of string.
\s* # Optional leading whitespace.
(?P<time> # $time: military or AM/PM time.
(?: # Group for military hours options.
[2][0-3] # Hour is either 20, 21, 22, 23,
| [01]?[0-9] # or 0-9, 00-09 or 10-19
) # End group of military hours options.
(?: # Group for optional minutes.
: # Hours and minutes separated by ":"
[0-5][0-9] # 00-59 minutes
)? # Military minutes are optional.
| # or time is given in AM/PM format.
(?:1[0-2]|0?[1-9]) # 1-12 or 01-12 AM/PM options (hour)
(?::[0-5][0-9])? # Optional minutes for AM/PM time.
\s* # Optional whitespace before AM/PM.
[ap]m # Required AM or PM (case insensitive)
) # End group of time options.
\s+ # Required whitespace.
(?P<offset> \d+ ) # $offset: count of time increments.
\s+ # Required whitespace.
(?P<units> # $units: units of time increment.
(?:sec(?:ond)?|min(ute)?|hour|day|week|month|year|decade|century)
s? # Time units may have optional plural "s".
) # End $units: units of time increment.
\s+ # Required whitespace.
(?P<dir>from|before|after|since) # #dir: Time offset direction.
\s+ # Required whitespace.
(?P<base>yesterday|today|tomorrow|(?:right )?now)
\s* # Optional whitespace before end.
$ # Anchor to end of string.""",
re.IGNORECASE | re.VERBOSE)
match = reobj.match(' 3 pm 2 days from today')
if match:
print('Time: %s' % (match.group('time')))
print('Offset: %s' % (match.group('offset')))
print('Units: %s' % (match.group('units')))
print('Direction: %s' % (match.group('dir')))
print('Base time: %s' % (match.group('base')))
else:
print("No match.")
<强>输出:强>
r"""
Time: 3 pm
Offset: 2
Units: days
Direction: from
Base time: today
"""
这个正则表达式说明了一些可以吸取的教训:
现代正则表达式包含丰富而强大的语言。一旦你learn the syntax养成了编写冗长,正确缩进,注释良好的代码的习惯,那么即使复杂的正则表达式也很容易编写,易于阅读且易于维护。不幸的是,他们因困难,笨拙和容易出错而声名鹊起(因此不适合复杂的任务)。
快乐的复兴!
答案 1 :(得分:2)
怎么样
test = re.compile(r' ([0-9]|1[012])(am|pm) \d+ days from (yesterday|today|tomorrow)')
小时部分应匹配0,1,...,9或10,11,12 但不是13,14,......,19。
你可以用类似的方式限制天数,1,...,1000,即(1000 | \ d {1,3})。
答案 2 :(得分:1)
试试这个:
import re
test = re.compile('^\s[0-1]?[0-9]{1}pm \d+ days from (today|yesterday|tomorrow)$')
print test.match(" 12pm 2 days from today")
您遇到的问题是您无法在regex(afaik)中指定多位数字范围,因此您必须将它们视为单个字符。
<强> Sample here 强>
答案 3 :(得分:1)
试试这个:
test = re.compile(' \d+(am|pm) \d+ days from (yesterday|today|tomorrow)')
答案 4 :(得分:1)
如果您想单独提取匹配的部分,可以使用(?P<name>[match])
标记组。例如:
import re
pattern = re.compile(
r'\s*(?P<time>1?[0-9])(?P<ampm>am|pm)\s+'
r'(?P<days>[1-9]\d*)\s+days\s+from\s+'
r'(?P<when>yesterday|today|tomorrow)\s*')
for time in range(0, 13):
for ampm in ('am', 'pm'):
for days in range(1, 1000):
for when in ('yesterday', 'today', 'tomorrow'):
text = ' %d%s %d days from %s ' % (time, ampm, days, when)
match = pattern.match(text)
assert match is not None
keys = sorted(match.groupdict().keys())
assert keys == ['ampm', 'days', 'time', 'when']
text = ' 3pm 2 days from today '
print pattern.match(text).groupdict()
输出:
{'time': '3', 'when': 'today', 'days': '2', 'ampm': 'pm'}
答案 5 :(得分:1)
test = re.compile(' 1?\d[ap]m \d{1,3} days? from (?:yesterday|today|tomorrow)')
阅读了Rumple Stiltskin和Demian Brecht之间的讨论后,我注意到我的上述命题很差,因为它检测到某种字符串结构,但它并没有准确地证明它是一个很好的“时间模式”字符串,因为例如,它可以检测到“从今天开始的2天下午18点”。
所以我现在建议一种模式,它允许精确检测一个验证你的需求的字符串,并指出每个字符串具有与有效字符串相同的结构,但没有指出有效的“时间模式”字符串所需的值:
import re
regx = re.compile("(?<= )" # better than a blank as first character
""
"(?:(1[012]|\d)([ap]m) (?!0 )(\d{1,3}|1000)"
"|"
"(\d+)([ap]m) (\d+))"
""
" days? from (yesterday|today|tomorrow)") # shared part
for ch in (" 12pm 2 days from today",
" 4pm 1 day from today",
" 12pm 0 days from today",
" 12pm 1001 days from today",
" 18pm 2 days from today",
" 1212pm 2 days from today",
" 12pm five days from today"):
print ch
mat = regx.search(ch)
if mat:
if mat.group(1):
print mat.group(1,2,3,7),'\n# time-pattern-VALIDATED string #'
else:
print mat.group(4,5,6,7),'\n* SIMILI-time-pattern STRUCTURED string*'
else:
print '- NO STRUCTURED STRING in the text -'
print
结果
12pm 2 days from today
('12', 'pm', '2', 'today')
# time-pattern-VALIDATED string #
4pm 1 day from today
('4', 'pm', '1', 'today')
# time-pattern-VALIDATED string #
12pm 0 days from today
('12', 'pm', '0', 'today')
* SIMILI-time-pattern STRUCTURED string*
12pm 1001 days from today
('12', 'pm', '1001', 'today')
* SIMILI-time-pattern STRUCTURED string*
18pm 2 days from today
('18', 'pm', '2', 'today')
* SIMILI-time-pattern STRUCTURED string*
1212pm 2 days from today
('1212', 'pm', '2', 'today')
* SIMILI-time-pattern STRUCTURED string*
12pm five days from today
- NO STRUCTURED STRING in the text -
如果只需要一个检测时间模式验证字符串的正则表达式,则只能使用
regx = re.compile("(?<= )(1[012]|\d)([ap]m) (?!0 )(\d{1,3}|1000) days?"
" from (yesterday|today|tomorrow)")
答案 6 :(得分:0)
匹配后检查整数范围更容易(也更易读):
m = re.match(r' (\d+)(?:pm|am) (\d+) days from (yesterday|today|tomorrow)',
" 3pm 2 days from today")
assert m and int(m.group(1)) <= 12 and 1 <= int(m.group(2)) <= 1000
或者您可以使用现有的库,例如pip install parsedatetime
:
import parsedatetime.parsedatetime as pdt
cal = pdt.Calendar()
print cal.parse("3pm 2 days from today")
((2011, 4, 26, 15, 0, 0, 1, 116, -1), 3)