我需要一些帮助打印出一个已排序的txt日志文件。 除了我不想多次打印出相同的IP号码外,打印没有问题。
这是我的代码。
text_file = open("access_log.txt")
entire_file = text_file.readlines()
text_file.close()
for line in reversed(entire_file):
try:
arr = line.split(' ')
date = arr[3]
print arr[0], "- - ", date[1:], " ",arr[6]
except IndexError, e:
error = e
如您所愿,我只想打印出访问过的IP号码,日期和页面。但只有一次来自类似的IP。
好吧,你可能会看到我是一个初学者=) 感谢
答案 0 :(得分:4)
# empty set of already seen ips:
seen_ips = set()
with open("access_log.txt") as f:
for line in lines:
arr = line.split(' ')
date = arr[3]
# if the ip still not seen, then print and add it to the seen_ips set:
if arr[0] not in seen_ips:
print arr[0], "- - ", date[1:], " ",arr[6]
seen_ips.add(arr[0])
# else (i.e. ip already seen) ignore and go on with the next line
答案 1 :(得分:0)
您可以使用groupby()
中的itertools
通过您指定的密钥对可迭代进行分组,然后只对密钥(或组中的第一项)进行操作,只要它已排序即可:
split=lambda l: l.split(' ')
for key, group in groupby(sorted(map(split, f)), key=itemgetter(0)):
line=next(group)
print key, "- - ", line[3][1:], " ", line[6]