如何计算电子邮件中的类似域并仅打印每个域一次[python]?

时间:2014-07-27 12:31:23

标签: python printing output

我有10个hotmail电子邮件,4个gmails,3个mail.com的数据集。我想分析电子邮件列表并打印每个域(hotmail,gmail等)的数量并打印出来。但我是以非常强大的方式做到这一点。 我知道python有优雅的短代码(例如itertools,islice,xrange)

hotmail:10 gmail:4 mail.com:3

但我明白了:

的Hotmail 10 Hotmail的 10 ... Hotmail的 10 Gmail的 4 Gmail的 4 Gmail的 4 Gmail的 4 等

def count_domains( emails):

    for email in emails:

        current_email = email.split("@", 2)[1] # splits at @, john@mail.com => mail.com, 
                                               #2nd index in the list
        print(current_email)
        current_domain_counter = 0
        for email2 in emails:
            if current_email == email2.split("@",2)[1]:
                current_domain_counter = current_domain_counter + 1
        #print(current_email current_domain_counter)
        print(current_domain_counter)

3 个答案:

答案 0 :(得分:2)

你可以使用collections.Counter:

email=['me@mail.com','you@mail.com',"me@gmail.com","you@gmail.com","them@gmail.com",'you@hotmail.com',"me@hotmail.com","you@hotmail.com","them@hotmail.com"]


from collections import Counter 
def count_domains(emails):
    c = Counter()
    for email in emails:
        current_email = email.split("@", 2)[1] # splits at @, john@mail.com => mail.com, 
        c.update([current_email]) # wrap in list or will end up counting each letter                                     #2nd index in the list
    print(c.most_common()) # print most common domains
    print ("gmail.com count = {}".format(c["gmail.com"]))
    print ("mail.com count = {}".format(c["mail.com"]))
    print ("hotmail.com count = {}".format(c["hotmail.com"]))

print count_domains(email)

[('hotmail.com', 4), ('gmail.com', 3), ('mail.com', 2)]
gmail.com count = 3
mail.com count = 2
hotmail.com count = 4

答案 1 :(得分:1)

如果将所有字符串放入列表中,例如myList,则可以使用

使其唯一
uniqueList = list(set(myList))

之后,您可以使用例如,得到第一个字符串的计数

countFirst = myList.count(uniqueList[0])

你可以把事情放在一起,比如

[[domain,myList.count(domain)] for domain in set(myList)]

答案 2 :(得分:0)

你做得太多了(至少我是这么认为的)。拆分字符串是不必要的。您只需检查" @ gmail.com"," @ hotmail.com"," @ mail.com"等关键字的整个字符串,然后添加每个字符串到它自己的柜台。

gmail_counter = 0
hotmail_counter = 0
mail_counter = 0
# Add as many counters as required
for email in emails:
    if email.find("@gmail.com") >= 0
        gmail_counter += 1
    elif email.find("@hotmail.com") >= 0
        hotmail_counter += 1
    elif email.find("@mail.com") >= 0
        mail_counter += 1
    # ...