我有一个包含多个值的列表,例如:
l = [
'210-4521268-18',
'210.0622277.13',
'rachid 312-0653348-08',
'3000401732 00000 064 77063',
....,
'312-0653348-08 rachid'
]
我只想获取与以下正则表达式对应的格式为“210.0622277.13”的项目:
r'\d{3}\D?\d{7}\D?\d{2}'
到目前为止,我已经编写了以下正则表达式来获取这些值:
regex = re.compile(r'((\d{3}\D?\d{7}\D?\d{2}$)|(^\d{3}\D?\d{7}\D?\d{2}))')
# loop through the list to fetch desired part of value
for line in l:
match = regex.search(line)
if match:
print('line : {} found a match {}'.format(line, line[match.start():match.end()]))
else:
print('line : {} found no match'.format(line)
问题是值'3000401732 00000 064 77063'匹配
如何优化此正则表达式,使其不再接受所需模式之后的数字,以防在模式之后有更多数字将丢弃该值。
我需要捕捉的比赛是:
l = [
'210-4521268-18',
'210.0622277.13',
'312-0653348-08',
'312-0653348-08'
]
所以输出将是这样的:
line : 210-4521268-18 found a match 210-4521268-18
line : 210.0622277.13 found a match 210.0622277
line : rachid 312-0653348-08 found a match 312-0653348-08
line : 3000401732 00000 064 77063 found no match
line : 312-0653348-08 rachid found a match 312-0653348-08
答案 0 :(得分:2)
这应该适合你:
\d{3}[^\d]\d{7}[^\d]\d{2}
现场演示here
<强>解释强>:
\d{3}
:寻找3位数
[^\d]\d{7}
:查找非数字,然后查找7位
[^\d]\d{2}
:再次查找非数字,然后查找2位
答案 1 :(得分:1)
你可以试试这个:
import re
l = [
'210-4521268-18',
'210.0622277.13',
'rachid 312-0653348-08',
'3000401732 00000 064 77063',
'312-0653348-08 rachid'
]
final_vals = [re.findall('\d+[\W]\d+[\W]\d+', i)[0] for i in l if re.findall('\d+\.|-\d+\.|-\d+', i)]
输出:
['210-4521268-18', '210.0622277.13', '312-0653348-08', '312-0653348-08']
答案 2 :(得分:1)
使用以下方法:
l = [
'210-4521268-18',
'210.0622277.13',
'rachid 312-0653348-08',
'3000401732 00000 064 77063',
'312-0653348-08 rachid'
]
regex = re.compile(r'\d{3}(?:\.|-)\d{7}(?:\.|-)\d{2}')
for line in l:
match = regex.search(line)
if match:
print('line : {} found a match {}'.format(line, match.group()))
else:
print('line : {} found no match'.format(line))
输出:
line : 210-4521268-18 found a match 210-4521268-18
line : 210.0622277.13 found a match 210.0622277.13
line : rachid 312-0653348-08 found a match 312-0653348-08
line : 3000401732 00000 064 77063 found no match
line : 312-0653348-08 rachid found a match 312-0653348-08
答案 3 :(得分:0)
这些是有效的匹配 - 该字符串在返回的字符串中。
尝试在前面添加^,在后面添加/或以其他方式添加,以指定其他数据将无法匹配。
regex = re.compile(r'^((\d{3}\D?\d{7}\D?\d{2}$)|(^\d{3}\D?\d{7}\D?\d{2}))$')
答案 4 :(得分:0)
您可以按点添加过滤器“。”像这样:
import re
l = [
'210-4521268-18',
'210.0622277.13',
'rachid 312-0653348-08',
'3000401732 00000 064 77063',
'312-0653348-08 rachid'
]
regex = re.compile(r'\b(\w+[.]\w+)')
# loop through the list to fetch desired part of value
for line in l:
match = regex.search(line)
if match:
print('line : {} found a match {}'.format(line, line[match.start():match.end()]))
else:
print('line : {} found no match'.format(line))
结果我得到了:
line : 210-4521268-18 found no match
line : 210.0622277.13 found a match 210.0622277
line : rachid 312-0653348-08 found no match
line : 3000401732 00000 064 77063 found no match
line : 312-0653348-08 rachid found no match
答案 5 :(得分:0)
尝试指定点明确表示并标记开始和结束。
r'^\d{3}[^\d]\d{7}[^\d]\d{3}$'