Question

编写一个程序以通读mbox-short.txt并找出每个消息在一天中的小时分布。

您可以从“发件人”行中抽出一个小时，方法是找到时间，然后使用冒号再次分隔字符串。

一旦您累积了每小时的计数，请打印出计数，按小时排序，如下所示。

name = input('Enter file name: ')
if len(name)<1:
    name = 'mbox-short.txt'
hand = open(name)
counts = dict()


for line in hand:
    if not line.startswith('From '):
        continue
    words = line.split(' ')
    words = words[6]
    #print(words.split(':'))
    hour = words.split(':')[0]
    counts[hour] = counts.get(hour, 0) + 1
for k,v in sorted(counts.items()):
     print(k,v)

我必须使用[6]来分割电子邮件中的时间。但是不应该是5吗？

我需要从中提取小时的行如下所示：从stephen.marquard@uct.ac.za星期六1月5日09:14:16 200

Answer 1

是的，你是对的，在这个例子中索引应该是5。通过计数的方式，collections模块中有一个内置对象。您可以像这样重写代码：

from collections import Counter

counter = Counter()

name = input('Enter file name: ')
if len(name) < 1:
    name = 'mbox-short.txt'

with open(name) as fp:
    for line in fp:
        if line.startswith('From'):
            words = line.split(' ')
            time = words[5]
            hour = time.split(':')[0]
            counter[hour] += 1
for hour, freq in sorted(counter.items(), key=lambda x: int(x[0])):
     print(hour, freq)

您还可以通过以下方式访问最常见的项目：

counter.most_common(10) # it'll show you the first 10 most common items

为什么在以下代码中需要使用[6]来分割时间？

1 个答案: