我想提高代码的可读性和格式。我有这个代码,这有效,但我觉得它可能比这更严格,我似乎无法让它以任何其他方式工作。我们的想法是读取.txt文件,查找传入的电子邮件字符串,并按照发送的小时频率组织数据。
以下是我在文中寻找的示例行:
来自email@emailaddress.com 2008年1月5日星期六09:14:16
这是我今天的代码。
fname = input("Enter file:")
if len(fname) <1 : fname = "mbox-short.txt"
fh = open(fname)
time = list()
hours = list()
hr = dict()
for line in fh:
if not line.startswith("From "): continue
words = line.split()
time.append(words[5])
for i in time:
i.split(":")
hours.append(i[:2])
for h in hours:
hr[h]=hr.get(h, 0)+1
l = list()
for k,v in hr.items():
l.append((k,v))
l.sort()
for k, v in l:
print (k,v)
答案 0 :(得分:1)
这是(我认为)功能相同的代码:
from collections import Counter
fname = input("Enter file: ")
if fname == "":
fname = "mbox-short.txt"
hour_counts = Counter()
with open(fname) as f:
for line in f:
if not line.startswith("From "):
continue
words = line.split()
time = words[5]
hour = time[:2]
hour_counts[hour] += 1
for hour, count in sorted(hour_counts.items()):
print(hour, count)
您可能还想使用现有的Python库解析mbox格式,而不是自己动手。
答案 1 :(得分:0)
只是一些提示:(不要在家里尝试这个,这是非常糟糕的代码:D,但是要显示一些python结构来学习)(operator,defaultdict和list comprehension)
from collections import defaultdict
import operator
hr = defaultdict(int)
with open(fname) as fh:
hours = [data.split()[5].split(":")[:2] for data in fh if data.startswith("From ")]
for h in hours:
hr[h]+=1
sorted_hr = sorted(hr.items(),key=operator.itemgetter(1))
for k, v in sorted_hr:
print (k,v)
答案 2 :(得分:0)
正则表达式方法将是这样的
import re
hours=[]
with open("new_file") as textfile:
for line in textfile:
if re.search("^From [A-Za-z0-9]+[@][a-zA-Z]+[.][a-z]{3}",line):
hours.append(re.sub(".*([0-9]{2})[:][0-9]{2}[:][0-9]{2} [0-9]{4}.*","\\1",line.strip()))
hours.sort()
print(hours)
示例强>
如果以下数据位于文件new_file
kwejrhkhwr
From johnking@emailaddress.com Sat Jan 5 09:14:16 2008
From JohnPublic@emailaddress.com Sat Dec 31 01:40:16 2015
Something not needed here
Something not needed here
From JohnPublic125@emailaddress.com Sat Oct 25 44:03:10 2015
按小时升序输出
['01', '09', '44']