Python代码可读性帮助

时间:2015-11-19 21:02:37

标签: python python-3.x type-slicing

我想提高代码的可读性和格式。我有这个代码,这有效,但我觉得它可能比这更严格,我似乎无法让它以任何其他方式工作。我们的想法是读取.txt文件,查找传入的电子邮件字符串,并按照发送的小时频率组织数据。

以下是我在文中寻找的示例行:

  

来自email@emailaddress.com 2008年1月5日星期六09:14:16

这是我今天的代码。

fname = input("Enter file:")
if len(fname) <1 : fname = "mbox-short.txt"
fh = open(fname)
time = list()
hours = list()
hr = dict()

for line in fh:
        if not line.startswith("From "): continue
        words = line.split()
        time.append(words[5])

for i in time:
        i.split(":")
        hours.append(i[:2])

for h in hours:
        hr[h]=hr.get(h, 0)+1

l = list()
for k,v in hr.items():
        l.append((k,v))
l.sort()
for k, v in l:
        print (k,v)

3 个答案:

答案 0 :(得分:1)

这是(我认为)功能相同的代码:

from collections import Counter

fname = input("Enter file: ")
if fname == "":
    fname = "mbox-short.txt"

hour_counts = Counter()
with open(fname) as f:
    for line in f:
        if not line.startswith("From "):
            continue
        words = line.split()
        time = words[5]
        hour = time[:2]
        hour_counts[hour] += 1

for hour, count in sorted(hour_counts.items()):
    print(hour, count)

您可能还想使用现有的Python库解析mbox格式,而不是自己动手。

答案 1 :(得分:0)

只是一些提示:(不要在家里尝试这个,这是非常糟糕的代码:D,但是要显示一些python结构来学习)(operator,defaultdict和list comprehension)

from collections import defaultdict
import operator

hr = defaultdict(int)

with open(fname) as fh:
    hours = [data.split()[5].split(":")[:2] for data in fh if data.startswith("From ")]

for h in hours:
    hr[h]+=1

sorted_hr = sorted(hr.items(),key=operator.itemgetter(1))
for k, v in sorted_hr:
        print (k,v)

答案 2 :(得分:0)

正则表达式方法将是这样的

import re
hours=[]
with open("new_file") as textfile:
    for line in textfile:
        if re.search("^From [A-Za-z0-9]+[@][a-zA-Z]+[.][a-z]{3}",line):
            hours.append(re.sub(".*([0-9]{2})[:][0-9]{2}[:][0-9]{2} [0-9]{4}.*","\\1",line.strip()))

hours.sort()               
print(hours)

示例 如果以下数据位于文件new_file

kwejrhkhwr
From johnking@emailaddress.com Sat Jan 5 09:14:16 2008
From JohnPublic@emailaddress.com Sat Dec 31 01:40:16 2015
Something not needed here
Something not needed here
From JohnPublic125@emailaddress.com Sat Oct 25 44:03:10 2015

按小时升序输出

['01', '09', '44']