提取一些可选的令牌

时间:2010-08-02 19:54:50

标签: python regex string

我需要从一个字符串中解析出时间标记,其中标记是可选的。样品给出:

  • TT-5d10h
  • TT-5d10h30m
  • TT-5d30m
  • TT-10h30m
  • TT-5D
  • TT-10H
  • TT-30M

我怎样才能在Python中解析出这个优先级(天,小时,分钟)?

5 个答案:

答案 0 :(得分:4)

该程序为每个输入返回三个整数(天,小时,秒):

import re
samples = ['tt-5d10h', 'tt-5d10h30m', 'tt-5d30m', 'tt-10h30m', 'tt-5d', 'tt-10h', 'tt-30m',]

def parse(text):
    match = re.match('tt-(?:(\d+)d)?(?:(\d+)h)?(?:(\d+)m)?', text)
    values = [int(x) for x in match.groups(0)]
    return values

for sample in samples:
    print parse(sample)

输出:

[5, 10, 0]
[5, 10, 30]
[5, 0, 30]
[0, 10, 30]
[5, 0, 0]
[0, 10, 0]
[0, 0, 30]

答案 1 :(得分:2)

>>> pattern = re.compile("tt-(\d+d)?(\d+h)?(\d+m)?")
>>> results = pattern.match("tt-5d10h")
>>> days, hours, minutes = results.groups()
>>> days, hours, minutes
('5d', '10h', None)

答案 2 :(得分:1)

与compie的回答类似,但最终结果更好:

re.match('tt-(?:(?P<days>\d+)d)?(?:(?P<hours>\d+)h)?(?:(?P<minutes>\d+)m)?', text).groupdict()

示例:

>>> import re
>>> s = ['tt-5d10h', 'tt-5d10h30m', 'tt-5d30m', 'tt-10h30m', 'tt-5d', 'tt-10h', 'tt-30m']
>>> for text in s:
    print(re.match('tt-(?:(?P<days>\d+)d)?(?:(?P<hours>\d+)h)?(?:(?P<minutes>\d+)m)?', text).groupdict())

{'hours': '10', 'minutes': None, 'days': '5'}
{'hours': '10', 'minutes': '30', 'days': '5'}
{'hours': None, 'minutes': '30', 'days': '5'}
{'hours': '10', 'minutes': '30', 'days': None}
{'hours': None, 'minutes': None, 'days': '5'}
{'hours': '10', 'minutes': None, 'days': None}
{'hours': None, 'minutes': '30', 'days': None}

如果您想用0代替剩余代币,只需使用groupdict(0)代替groupdict()

答案 3 :(得分:1)

按分区:

inputstring="""tt-5d10h
tt-5d10h30m
tt-5d30m
tt-10h30m
tt-5d
tt-10h
tt-30m
"""
separators=('d','h','m')
result=[]
for text in (item.lstrip('t-') for item in inputstring.splitlines()):
    data=[]
    for sep in separators:
        d,found,text = text.partition(sep)
        if found: data.append(int(d.rstrip(sep)))
        else:
            data.append(0)
            text=d
    result.append(data)
# show input and result
for respairs in zip(inputstring.splitlines(),result): print(respairs)
""" Output:
('tt-5d10h', [5, 10, 0])
('tt-5d10h30m', [5, 10, 30])
('tt-5d30m', [5, 0, 30])
('tt-10h30m', [0, 10, 30])
('tt-5d', [5, 0, 0])
('tt-10h', [0, 10, 0])
('tt-30m', [0, 0, 30])
"""

答案 4 :(得分:1)

这是针对您的问题的pyparsing方法:

tests = """tt-5d10h 
tt-5d10h30m 
tt-5d30m 
tt-10h30m 
tt-5d 
tt-10h 
tt-30m""".splitlines()

from pyparsing import Word,nums,Optional

integer = Word(nums).setParseAction(lambda t:int(t[0]))

timeFormat = "tt-" + (
                Optional(integer("days") + "d") +
                Optional(integer("hrs")  + "h") +
                Optional(integer("mins") + "m")
                )

def normalizeTime(tokens):
    return tuple(tokens[field] if field in tokens else 0 
                for field in "days hrs mins".split())

timeFormat.setParseAction(normalizeTime)

for test in tests:
    print "%-12s ->" % test, 
    print "%d %02d:%02d" % timeFormat.parseString(test)[0]

打印:

tt-5d10h     -> 5 10:00
tt-5d10h30m  -> 5 10:30
tt-5d30m     -> 5 00:30
tt-10h30m    -> 0 10:30
tt-5d        -> 5 00:00
tt-10h       -> 0 10:00
tt-30m       -> 0 00:30

或者保留命名结果:

def normalizeTime(tokens):
    for field in "days hrs mins".split():
        if field not in tokens:
            tokens[field] = 0

timeFormat.setParseAction(normalizeTime)

for test in tests:
    print "%-12s ->" % test, 
    print "%(days)d %(hrs)02d:%(mins)02d" % timeFormat.parseString(test)