我有以下代码根据下面概述的情况提取给定字符串中的第一和最后一组数字。它可以工作,但似乎不是最佳选择:
import re
# case 1
pattern = '\d+\ \d+'
string = 'Hello 999 888999'
test = re.findall(pattern, string, flags=0)[0].split()
print('{0}, {1}'.format(test[0], test[len(test)-1]))
# case 2
pattern = '\d+\ \d+;\d+ \d+'
string = 'How are things 999 888999;222 444'
test = re.findall(pattern, string, flags=0)[0].split()
print('{0}, {1}'.format(test[0], test[len(test)-1]))
# case 3
pattern = '\d+\ \d+;\d+ \d+;\d+ \d+'
string = 'It is nice 999 888999;222 444;33 55'
test = re.findall(pattern, string, flags=0)[0].split()
print('{0}, {1}'.format(test[0], test[len(test)-1]))
# case 4
pattern = '\d+\ \d+;\d+ \d+;\d+ \d+;\d+ \d+'
string = 'Please help yourself 999 888999;222 444;33 55;44 6661'
test = re.findall(pattern, string, flags=0)[0].split()
print('{0}, {1}'.format(test[0], test[len(test)-1]))
这4种情况是:
关于如何一口气做到这一点的任何建议?
答案 0 :(得分:1)
这听起来像是常见的模式,就是您要查找初始的数字字符串和最终的数字字符串。您可以使用
(\d+).*?(\d+$)
要尽快匹配并捕获尽可能多的数字,请延迟重复任何字符,直到获得另一个数字字符串,然后是字符串末尾。
pattern = re.compile(r'(\d+).*?(\d+$)')
for str in ['Hello 999 888999', 'How are things 999 888999;222 444', 'It is nice 999 888999;222 444;33 55', 'Please help yourself 999 888999;222 444;33 55;44 6661']:
match = re.search(pattern, str)
print(', '.join(match.groups()))
答案 1 :(得分:0)
您可以尝试以下方法:
import re
pattern = re.compile('(\d+\s\d+(;)?){1,4}')
texts = ['Hello 999 888999', 'How are things 999 888999;222 444', 'It is nice 999 888999;222 444;33 55',
'Please help yourself 999 888999;222 444;33 55;44 6661']
for text in texts:
match = pattern.search(text)
if match:
split = match.group().split()
print('{0}, {1}'.format(split[0], split[len(split) - 1]))
输出
999, 888999
999, 444
999, 55
999, 6661
正则表达式
重复图案(\d+\s\d+(;)?){1,4}
1、2、3或4次。模式几乎与您相同:
\d+
一个或多个数字\s
一个空格\d+
一个或多个数字;
((;)?
)