我想从:
转换字符串(字幕)585
00:59:59,237 --> 01:00:01,105
- It's all right. - He saw us!
586
01:00:01,139 --> 01:00:03,408
I heard you the first time.
到
59:59 - 没关系。 - 他看到了我们!01:00:01我第一次听到你了。
* 我想要的是:如果时间在一小时之内,请修剪“00:”前缀,如果时间大于1小时则保留*
我的正则表达式是:
pat = re.compile(r"""
#\s* # Skip leading whitespace
\d+\s # remoe lines contain only numbers
((?:(?:00)|(?P<hour>01)):(?P<time>\d{2}:\d{2})[,0-9->]+.*)[\r\n]+(?P<content>.*)[\r\n]+
""",
re.VERBOSE)
data = pat.sub(r"\g<hour>\g<time> \g<content>", data)
仅在未使用“\g<hour>
”时才有效。
有谁可以帮助我?
答案 0 :(得分:2)
我想,这就是你要找的东西:
import re
s = """
585
00:59:59,237 --> 01:00:01,105
- It's all right. - He saw us!
586
01:00:01,139 --> 01:00:03,408
I heard you the first time.
"""
for line in re.findall(r'(\d+:)(\d+:\d+)(?:.*\n)(.*)', s):
if line[0] == '00:':
print ' '.join(line[1:])
else:
print ' '.join([''.join(line[0:2]), line[2]])
输出:
# 59:59 - It's all right. - He saw us!
# 01:00:01 I heard you the first time.
答案 1 :(得分:1)
只是为了给出非重复方法(应该更快):
a = """585
00:59:59,237 --> 01:00:01,105
- It's all right. - He saw us!
586
01:00:01,139 --> 01:00:03,408
I heard you the first time."""
for i, x in enumerate(a.split('\n')):
m = i % 4
if m == 0:
continue
elif m == 3:
continue
elif m == 1:
print x[:x.find(":", x.find(":") + 1)],
elif m == 2:
print x