python regrex无法获取数据

时间:2018-11-26 02:05:05

标签: python regex-group

问题:我的目标是从文件中获取所有电话号码。 我可以获取所有电话,但数据文件倒数第二行中存在名为“ suneja,amit ”的用户。 我能够将其提取到使用3组代码的步骤3。但是当我尝试使用第四组时并没有出现。

这是数据文件:

Love, Kenneth   kenneth@teamtreehouse.com   +1 (555) 555-5555   Teacher, Treehouse  @kennethlove
McFarland, Dave dave@teamtreehouse.com  (555) 555-5554  Teacher, Treehouse
Arthur, King    king_arthur@camelot.co.uk   King, Camelot
Österberg, Sven-Erik    governor@norrbotten.co.se       Governor, Norrbotten    @sverik
, Tim   tim@killerrabbit.com        Enchanter, Killer Rabbit Cave
Carson, Ryan    ryan@teamtreehouse.com  (555) 555-5543  CEO, Treehouse  @ryancarson
Doctor, The doctor+companion@tardis.co.uk       Time Lord, Gallifrey
Exampleson, Example me@example.com  +1-555-555-5552 Example, Example Co.    @example
Obama, Barack   president.44@us.gov 555 555-5551    President, United States of America @potus44
Chalkley, Andrew    andrew@teamtreehouse.com    (555) 555-5553  Teacher, Treehouse  @chalkers
Vader, Darth    darth-vader@empire.gov  (555).555.4444  Sith Lord, Galactic Empire  @darthvader
suneja, amit    amit.suneja007@gmail.com 444-444444   B102, City Center @programmer
Fernndez de la Vega Sanz, María Teresa  mtfvs@spain.gov     First Deputy Prime Minister, Spanish Govt.

这是我的代码:

import re
data_file = 'names.txt'

with open(data_file, 'r', encoding="utf-8") as myfile:
    data_dump = myfile.read()

print("___________________________________")
print(re.findall(r"(\+\d[\-\s])", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})([\s.-]\d{4,6})", data_dump))
print(len(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})([\s.-]\d{4,6})", data_dump)))

这是我的代码输出:

___________________________________
['+1 ', '+1-']
___________________________________
[('+1 ', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '554'), ('+1-', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '444'), ('', '007'), ('', '444'), ('', '444'), ('', '444'), ('', '102')]
___________________________________
[('+1 ', '(555)', ' 555'), ('', '(555)', ' 555'), ('', '(555)', ' 555'), ('+1-', '555', '-555'), ('', '555', ' 555'), ('', '(555)', ' 555'), ('', '(555)', '.555'), ('', '444', '-444')]
___________________________________
[('+1 ', '(555)', ' 555', '-5555'), ('', '(555)', ' 555', '-5554'), ('', '(555)', ' 555', '-5543'), ('+1-', '555', '-555', '-5552'), ('', '555', ' 555', '-5551'), ('', '(555)', ' 555', '-5553'), ('', '(555)', '.555', '.4444')]
7

1 个答案:

答案 0 :(得分:0)

您只需在上一个正则表达式上进行一些更改即可使其正常工作:

(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})([\s.-]?\d{3,6})

仅在最后一个捕获组中添加了问号:([\s.-]?\d{3,6})

该组中的问号使[\ s.-]为可选。由于您的上一个电话号码中没有任何字符,因此它们必须是可选的