我有10个hotmail电子邮件,4个gmails,3个mail.com的数据集。我想分析电子邮件列表并打印每个域(hotmail,gmail等)的数量并打印出来。但我是以非常强大的方式做到这一点。 我知道python有优雅的短代码(例如itertools,islice,xrange)
hotmail:10 gmail:4 mail.com:3
但我明白了:
的Hotmail 10 Hotmail的 10 ... Hotmail的 10 Gmail的 4 Gmail的 4 Gmail的 4 Gmail的 4 等
def count_domains( emails):
for email in emails:
current_email = email.split("@", 2)[1] # splits at @, john@mail.com => mail.com,
#2nd index in the list
print(current_email)
current_domain_counter = 0
for email2 in emails:
if current_email == email2.split("@",2)[1]:
current_domain_counter = current_domain_counter + 1
#print(current_email current_domain_counter)
print(current_domain_counter)
答案 0 :(得分:2)
你可以使用collections.Counter:
email=['me@mail.com','you@mail.com',"me@gmail.com","you@gmail.com","them@gmail.com",'you@hotmail.com',"me@hotmail.com","you@hotmail.com","them@hotmail.com"]
from collections import Counter
def count_domains(emails):
c = Counter()
for email in emails:
current_email = email.split("@", 2)[1] # splits at @, john@mail.com => mail.com,
c.update([current_email]) # wrap in list or will end up counting each letter #2nd index in the list
print(c.most_common()) # print most common domains
print ("gmail.com count = {}".format(c["gmail.com"]))
print ("mail.com count = {}".format(c["mail.com"]))
print ("hotmail.com count = {}".format(c["hotmail.com"]))
print count_domains(email)
[('hotmail.com', 4), ('gmail.com', 3), ('mail.com', 2)]
gmail.com count = 3
mail.com count = 2
hotmail.com count = 4
答案 1 :(得分:1)
如果将所有字符串放入列表中,例如myList,则可以使用
使其唯一uniqueList = list(set(myList))
之后,您可以使用例如,得到第一个字符串的计数
countFirst = myList.count(uniqueList[0])
你可以把事情放在一起,比如
[[domain,myList.count(domain)] for domain in set(myList)]
答案 2 :(得分:0)
你做得太多了(至少我是这么认为的)。拆分字符串是不必要的。您只需检查" @ gmail.com"," @ hotmail.com"," @ mail.com"等关键字的整个字符串,然后添加每个字符串到它自己的柜台。
gmail_counter = 0
hotmail_counter = 0
mail_counter = 0
# Add as many counters as required
for email in emails:
if email.find("@gmail.com") >= 0
gmail_counter += 1
elif email.find("@hotmail.com") >= 0
hotmail_counter += 1
elif email.find("@mail.com") >= 0
mail_counter += 1
# ...