我有一个这样的字符串:
str = "something move 11 something move 12 something 13 copy 14 15"
其中“某事”表示某些文字,或根本没有文字。
因此我希望有一个列表:
[('move', 11, ''), ('move', 12, 13), ('copy', 14, 15)]
我试过用这个:
re.findall('(move|copy).+?([0-9]+).+?([0-9]+)*', str)
但它给出了我的输出:
[('move', 11, ''), ('move', 12, ''), ('copy', 14, '')]
我理解这是因为最后一个号码是可选的,但我不知道我怎么能运作。
我该怎么做?
答案 0 :(得分:1)
您可以使用regular expression(带有lookbehind和lookahead):
In [1]: import re
In [2]: tokens = "something move 11 something move 12 something 13 copy 14 15"
In [3]: split_movements = re.split('(?<=\d)\s(?!\d+)', tokens)
In [4]: split_movements
Out[4]: ['something move 11', 'something move 12', 'something 13', 'copy 14 15']
In [5]: movements = [re.split('\s(?=\d+)', m) for m in split_movements]
In [6]: movements
Out[6]:
[['something move', '11'],
['something move', '12'],
['something', '13'],
['copy', '14', '15']]
答案 1 :(得分:1)
基于@Ashwini Chaudhary's answer:
#!/usr/bin/env python
import re
commands = "copy move".split()
input_string = "something move 11 something move 12 something 13 copy 14 15"
tokens = iter(re.split("(%s)" % "|".join(map(re.escape, commands)), input_string))
result = []
for tok in tokens:
if tok in commands:
args = re.findall(r"\d+", next(tokens, ""))
result.append((tok,) + tuple(args) + ("",)*(2 - len(args)))
print(result)
[('move', '11', ''), ('move', '12', '13'), ('copy', '14', '15')]
要将每个命令限制为两个参数,只需使用切片:tuple(arg[:2])
。