专家,我正在尝试计算maillog文件中的电子邮件地址和他们的重复次数,不知何故我可以使用正则表达式(re.search)或(re.match)进行,但我看这个用(re.findall)来完成,目前我正在考虑..将不胜感激任何建议..
1)代码行......
# cat maillcount31.py
#!/usr/bin/python
import re
#count = 0
mydic = {}
counts = mydic
fmt = " %-32s %-15s"
log = open('kkmail', 'r')
for line in log.readlines():
myre = re.search('.*from=<(.*)>,\ssize', line)
if myre:
name = myre.group(1)
if name not in mydic.keys():
mydic[name] = 0
mydic[name] +=1
for key in counts:
print fmt % (key, counts[key])
2) Output from the Current code..
# python maillcount31.py
root@MyServer1.myinc.com 13
User01@MyServer1.myinc.com 14
答案 0 :(得分:2)
希望这有帮助...
from collections import Counter
emails = re.findall('.*from=<(.*)>,\ssize', line)# Modify re according to your file pattern OR line pattern. If findall() on each line, each returned list should be combined.
result = Counter(emails)# type is <class 'collections.Counter'>
dict(result)#convert to regular dict
re.findall()将返回一个列表。查看How can I count the occurrences of a list item in Python?,还有其他方法可以计算返回列表中的单词。
顺便说一句,Counter的有趣功能:
>>> tmp1 = Counter(re.findall('from=<([^\s]*)>', "from=<usr1@gmail.com>, from=<usr2@gmail.com>, from=<usr1@gmail.com>, from=<usr1@gmail.com>, from=<usr1@gmail.com>,") )
>>> tmp1
Counter({'usr1@gmail.com': 4, 'usr2@gmail.com': 1})
>>> tmp2 = Counter(re.findall('from=<([^\s]*)>', "from=<usr2@gmail.com>, from=<usr3@gmail.com>, from=<usr1@gmail.com>, from=<usr1@gmail.com>, from=<usr1@gmail.com>,") )
>>> dict(tmp1+tmp2)
{'usr2@gmail.com': 2, 'usr1@gmail.com': 7, 'usr3@gmail.com': 1}
因此,如果文件非常大,我们可以计算每一行并通过Counter。
组合它们答案 1 :(得分:1)
您是否考虑过使用pandas,它可以为您提供一个很好的结果表,而无需使用正则表达式命令。
import pandas as pd
emails = pd.Series(email_list)
individual_emails = emails.unique()
tally = pd.DataFrame( [individual_emails , [0]*len(individual_emails)] )
#makes a table with emails and a zeroed talley
for item in individual_emails.index:
address = tally.iloc[item,0]
sum = len(email[email==address])
tally.iloc[item,1] = sum
print tally
答案 2 :(得分:1)
我希望底部的代码有帮助。
但是,通常需要注意以下三点:
#!/usr/bin/python
import re
from collections import Counter
fmt = " %-32s %-15s"
filename = 'kkmail'
# Extract the email addresses
email_list = []
with open(filename, 'r') as log:
for line in log.readlines():
_re = re.search('.*from=<(.*)>,\ssize', line)
if _re:
name = _re.group(1)
email_list.append(name)
# Count the email addresses
counts = dict(Counter(email_list)) # List to dict of counts: {'a':3, 'b':7,...}
for key, val in counts.iteritems():
print fmt % (key, val)