我正在尝试为
等表达式编写解析器“每周2017-11-03 15:00:00至2017-11-03 16:00:00至2017-12-03”
表示经常性的时间间隔。最后,我希望能够使用已解析的字段初始化dateutil.rrule对象。然而,大多数rrule
参数是可选的,在字符串表示中对应于可能存在或不存在的模式。
但是,我无法阻止先前的模式“过于贪婪”。请考虑以下示例,其中包含两个测试用例:
import re
import pytest
from dateutil.rrule import FREQNAMES
def match_pattern(string):
SPACES = r'\s*'
freq_names = [freq.lower() for freq in FREQNAMES] + [freq.title() for freq in FREQNAMES]
FREQ_PATTERN = '(?P<freq>{})?'.format("|".join(freq_names))
START_PATTERN = 'from' + SPACES + r'(?P<start>.+)'
END_PATTERN = 'till' + SPACES + r'(?P<end>.+)'
UNTIL_PATTERN = optional('until' + SPACES + r'(?P<until>.+)')
# UNTIL_PATTERN = 'until' + SPACES + r'(?P<until>.+)'
PATTERN = SPACES + FREQ_PATTERN \
+ SPACES + START_PATTERN \
+ SPACES + END_PATTERN \
+ SPACES + UNTIL_PATTERN + SPACES
return re.match(PATTERN, string).groupdict()
def optional(pattern):
'''Encloses the given regular expression in an optional group (i.e., one that matches 0 or 1 repetitions of the original regular expression).'''
return '({})?'.format(pattern)
'''Tests'''
def test_match_pattern():
string = "Weekly from 2017-11-03 15:00:00 till 2017-11-03 16:00:00"
groups = match_pattern(string)
assert groups['freq'] == "Weekly"
assert groups['start'].strip() == "2017-11-03 15:00:00"
assert groups['end'].strip() == "2017-11-03 16:00:00"
def test_match_pattern_with_until():
string = "Weekly from 2017-11-03 15:00:00 till 2017-11-03 16:00:00 until 2017-12-03"
groups = match_pattern(string)
assert groups['freq'] == "Weekly"
assert groups['start'].strip() == "2017-11-03 15:00:00"
assert groups['end'].strip() == "2017-11-03 16:00:00"
assert groups['until'].strip() == "2017-12-03"
if __name__ == "__main__":
# pytest.main([__file__])
pytest.main([__file__+"::test_match_pattern", "-s"])
# pytest.main([__file__+"::test_match_pattern_with_until", "-s"])
在这里,我想在字符串中设置UNTIL_PATTERN
可选项;因此我使用()?
函数将其括在optional
中。然而,问题是这使得第二次测试失败:
> assert groups['end'].strip() == "2017-11-03 16:00:00"
E assert '2017-11-03 1...il 2017-12-03' == '2017-11-03 16:00:00'
E - 2017-11-03 16:00:00 until 2017-12-03
E + 2017-11-03 16:00:00
parse_date.py:44: AssertionError
=========================== 1 failed in 0.07 seconds ===========================
问题在于,当我UNTIL_PATTERN
可选时,END_PATTERN
过于贪婪并消耗到字符串结尾。 (如果我放弃它optional()
,第二次测试通过,但第一次测试不会产生匹配。)
如何让两个测试通过?
答案 0 :(得分:2)
您只需进行两项小修改即可。首先,使SimpleDateFormat format = new SimpleDateFormat("EEE MMM dd HH:mm:ss zzz yyyy", Locale.ENGLISH);
非贪婪:
END_PATTERN
但是现在,因为它会尽可能少地匹配,所以你必须强制它匹配到字符串结尾,并带有一个字符串结束锚(?P<end>.+?)
:
$