可以说,我有一个巨大的列表,包括电话,电子邮件,网址,他们属于特定的组织/公司/个人。电话,电子邮件或网址的数量会有所不同。有些人可能没有电话号码或电子邮件等。
a_list = [
"+99112233",
"+39383",
"www.johndoe.com",
"info@JohnDoe.com".
"+9933933",
"+99883399",
"www.someother.com",
"www.tt.com"
"support@tt.com",
"info@tt.com",
]
我想把它们分成如下字典:
contacts = [
{ 'phones': ["+99112233", "+39383"],
'websites': ["www.johndoe.com"],
'emails': ['info@JohnDoe.com'],
},
{
'phones': ["+9933933","+99883399"],
'websites': ['www.someother.com'],
'emails': []
},
{
'phones': [],
'websites': ['www.tt.com'],
'emails': ['support@tt.com', 'info@tt.com']
}
]
到目前为止,这是我的代码:
push_flag = False
contacts = []
phones = []
emails = []
webs = []
for contact in a_list:
text = contact
if text[0]== "+":
if push_flag:
contacts.append({
'phones': phones,
'webs': webs,
'emails':emails,
})
phones = []
webs = []
emails = []
push_flag = False
phones.append(text)
elif text[0:3]=="www":
push_flag = True
webs.append(text)
elif "@" in text:
push_flag = True
emails.append(text)
contacts.append({
'phones': phones,
'webs': webs,
'emails':emails,
})
答案 0 :(得分:1)
有些事情可能会帮助您简化逻辑。首先,我使用正则表达式类别对列表来识别每个元素是电话号码,网站还是电子邮件地址。这种方法很好,因为它允许您轻松添加其他数据,而不必混淆解析代码的结构。其次,defaultdict(list)
似乎是每个联系人真正合适的结构。
import re
from collections import defaultdict
from more_itertools import peekable
category_pairs = [
(re.compile('^\+[0-9]+$'), 'phones'),
(re.compile('^www\..*?\.[A-Za-z]+$'), 'websites'),
(re.compile('^.+?@.+\.[A-Za-z]+$'), 'emails'),
]
contacts = []
current = defaultdict(list)
iterator = peekable(a_list)
entry = next(iterator)
while iterator.peek(False):
for regex, category in category_pairs:
while regex.match(entry):
current[category].append(entry)
if not iterator.peek(False):
break
entry = next(iterator)
contacts.append(current)
current = defaultdict(list)
此代码假设一个假设:电话号码,网站和电子邮件地址按此顺序发生,并将其分组。