Question

我写了一个简短的python脚本来在日志文件中搜索带有http状态代码的网址。该脚本按预期工作，并计算与特定的http状态代码结合使用的url的频率。结果字典未排序。这就是为什么我之后使用字典中的值对数据进行排序的原因。脚本的这一部分按预期工作，我得到一个包含URL和计数器的排序列表，该列表如下所示：

([('http://example1.com"', 1), ('http://example2.com"', 5), ('http://example3.com"', 10)])

我只是想使其可读性更好，并以行打印列表。

http://example1.com      1  
http://example2.com      5  
http://example3.com      10

我仅在两周前开始使用python，但找不到解决方案。我尝试了一些在stackoverflow上找到的解决方案，但是没有任何效果。我当前的解决方案将所有网址打印在单独的行中，但不显示计数。我不能使用逗号作为分隔符，因为在日志文件中有一些带有逗号的网址。我为我的英语不好和愚蠢的问题感到抱歉。预先谢谢你。

from operator import itemgetter
from collections import OrderedDict

d=dict()

with open("access.log", "r") as f:
    for line in f:
        line_split = line.split()
        list = line_split[5], line_split[8]
        url=line_split[8]
        string='407'
        if string in line_split[5]:
            if url in d:
                d[url]+=1
            else:
                d[url]=1


sorted_d = OrderedDict(sorted(d.items(), key=itemgetter(1)))

for element in sorted_d:
    parts=element.split(') ')
    print(parts)

Answer 1

for url, count in sorted_d.items():
    print(f'{url} {count}')

用以上内容替换最后一个for循环。

要说明：我们将网址解压缩，在for循环中的sorted_d中对数进行计数，然后使用f字符串打印网址并以空格分隔计数。

Answer 2

首先，如果您已经从collections库导入，为什么不导入Counter？

from collections import Counter

d=Counter()

with open("access.log", "r") as f:
    for line in f:
        line_split = line.split()
        list = line_split[5], line_split[8]
        url=line_split[8]
        string='407'
        if string in line_split[5]:
            d[url] += 1

for key, value in d.most_common():  # or reversed(d.most_common())
    print(f'{key} {value}')

Answer 3

关于如何在Python中格式化字符串，例如this

，有很多很好的教程

这里有一个示例代码如何打印字典。我使用变量c1和c2设置列的宽度。

c1 = 34; c2 = 10 
printstr = '\n|%s|%s|' % ('-'*c1, '-'*c2)
for key in sorted(d.keys()):
    val_str = str(d[key])
    printstr += '\n|%s|%s|' % (str(key).ljust(c1), val_str.rjust(c2))
printstr += '\n|%s|%s|\n\n' % ('-' * c1, '-' * c2)
print(printstr)

字符串函数ljust()创建一个长度作为参数传递的字符串，该字符串的内容左对齐。

如何以可读形式打印此列表？

3 个答案: