将大量元素分发到dict列表中

时间:2017-08-10 11:18:05

标签: python python-3.x list dictionary

可以说,我有一个巨大的列表,包括电话,电子邮件,网址,他们属于特定的组织/公司/个人。电话,电子邮件或网址的数量会有所不同。有些人可能没有电话号码或电子邮件等。

a_list = [
 "+99112233",
 "+39383",     
 "www.johndoe.com",
 "info@JohnDoe.com".
 "+9933933",
 "+99883399",
 "www.someother.com",
 "www.tt.com"
 "support@tt.com",
 "info@tt.com",
]

我想把它们分成如下字典:

contacts = [
{ 'phones': ["+99112233", "+39383"],
  'websites': ["www.johndoe.com"],
  'emails': ['info@JohnDoe.com'],

},
{
'phones': ["+9933933","+99883399"], 
'websites': ['www.someother.com'],
'emails': []
},
{
'phones': [], 
'websites': ['www.tt.com'],
'emails': ['support@tt.com', 'info@tt.com']
}
]

到目前为止,这是我的代码:

push_flag = False
contacts = []
phones = []
emails = []
webs  =   []
for contact in a_list:
    text = contact
    if text[0]== "+":
       if push_flag:
            contacts.append({
                'phones': phones,
                'webs': webs,
                'emails':emails,
             })
             phones = []
             webs = []
             emails = []
             push_flag = False
        phones.append(text)
    elif text[0:3]=="www":
        push_flag = True
        webs.append(text)
    elif "@" in text:
        push_flag = True
        emails.append(text)

contacts.append({
            'phones': phones,
            'webs': webs,
            'emails':emails,
            })

1 个答案:

答案 0 :(得分:1)

有些事情可能会帮助您简化逻辑。首先,我使用正则表达式类别对列表来识别每个元素是电话号码,网站还是电子邮件地址。这种方法很好,因为它允许您轻松添加其他数据,而不必混淆解析代码的结构。其次,defaultdict(list)似乎是每个联系人真正合适的结构。

import re
from collections import defaultdict
from more_itertools import peekable

category_pairs = [
    (re.compile('^\+[0-9]+$'), 'phones'),
    (re.compile('^www\..*?\.[A-Za-z]+$'), 'websites'),
    (re.compile('^.+?@.+\.[A-Za-z]+$'), 'emails'),
]

contacts = []
current = defaultdict(list)
iterator = peekable(a_list)
entry = next(iterator)

while iterator.peek(False):
    for regex, category in category_pairs:
        while regex.match(entry):
            current[category].append(entry)
            if not iterator.peek(False):
                break
            entry = next(iterator)
    contacts.append(current)
    current = defaultdict(list)

此代码假设一个假设:电话号码,网站和电子邮件地址按此顺序发生,并将其分组。