问题:我的目标是从文件中获取所有电话号码。 我可以获取所有电话,但数据文件倒数第二行中存在名为“ suneja,amit ”的用户。 我能够将其提取到使用3组代码的步骤3。但是当我尝试使用第四组时并没有出现。
这是数据文件:
Love, Kenneth kenneth@teamtreehouse.com +1 (555) 555-5555 Teacher, Treehouse @kennethlove
McFarland, Dave dave@teamtreehouse.com (555) 555-5554 Teacher, Treehouse
Arthur, King king_arthur@camelot.co.uk King, Camelot
Österberg, Sven-Erik governor@norrbotten.co.se Governor, Norrbotten @sverik
, Tim tim@killerrabbit.com Enchanter, Killer Rabbit Cave
Carson, Ryan ryan@teamtreehouse.com (555) 555-5543 CEO, Treehouse @ryancarson
Doctor, The doctor+companion@tardis.co.uk Time Lord, Gallifrey
Exampleson, Example me@example.com +1-555-555-5552 Example, Example Co. @example
Obama, Barack president.44@us.gov 555 555-5551 President, United States of America @potus44
Chalkley, Andrew andrew@teamtreehouse.com (555) 555-5553 Teacher, Treehouse @chalkers
Vader, Darth darth-vader@empire.gov (555).555.4444 Sith Lord, Galactic Empire @darthvader
suneja, amit amit.suneja007@gmail.com 444-444444 B102, City Center @programmer
Fernndez de la Vega Sanz, María Teresa mtfvs@spain.gov First Deputy Prime Minister, Spanish Govt.
这是我的代码:
import re
data_file = 'names.txt'
with open(data_file, 'r', encoding="utf-8") as myfile:
data_dump = myfile.read()
print("___________________________________")
print(re.findall(r"(\+\d[\-\s])", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})([\s.-]\d{4,6})", data_dump))
print(len(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})([\s.-]\d{4,6})", data_dump)))
这是我的代码输出:
___________________________________
['+1 ', '+1-']
___________________________________
[('+1 ', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '554'), ('+1-', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '444'), ('', '007'), ('', '444'), ('', '444'), ('', '444'), ('', '102')]
___________________________________
[('+1 ', '(555)', ' 555'), ('', '(555)', ' 555'), ('', '(555)', ' 555'), ('+1-', '555', '-555'), ('', '555', ' 555'), ('', '(555)', ' 555'), ('', '(555)', '.555'), ('', '444', '-444')]
___________________________________
[('+1 ', '(555)', ' 555', '-5555'), ('', '(555)', ' 555', '-5554'), ('', '(555)', ' 555', '-5543'), ('+1-', '555', '-555', '-5552'), ('', '555', ' 555', '-5551'), ('', '(555)', ' 555', '-5553'), ('', '(555)', '.555', '.4444')]
7
答案 0 :(得分:0)
您只需在上一个正则表达式上进行一些更改即可使其正常工作:
(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})([\s.-]?\d{3,6})
仅在最后一个捕获组中添加了问号:([\s.-]?\d{3,6})
该组中的问号使[\ s.-]为可选。由于您的上一个电话号码中没有任何字符,因此它们必须是可选的