我是python的新手,并且正在阅读一些关于使用正则表达式进行日志解析的教程。在下面的代码中,我能够解析日志并创建一个文件,远程IP与服务器建立连接。我错过了将在创建的out.txt文件中消除重复IP的部分。 感谢
import re
import sys
infile = open("/var/log/user.log","r")
outfile = open("/var/log/intruders.txt","w")
pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
regexp = re.compile(pattern, re.VERBOSE)
for line in infile:
result = regexp.search(line)
if result:
outfile.write("%s\n" % (result.group()))
infile.close()
outfile.close()
答案 0 :(得分:5)
您可以将到目前为止看到的结果保存在set()中,然后只显示尚未看到的结果。这个逻辑很容易添加到现有代码中:
import re
import sys
seen = set()
infile = open("/var/log/user.log","r")
outfile = open("/var/log/intruders.txt","w")
pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
regexp = re.compile(pattern, re.VERBOSE)
for line in infile:
mo = regexp.search(line)
if mo is not None:
ip_addr = mo.group()
if ip_addr not in seen:
seen.add(ip_addr)
outfile.write("%s\n" % ip_addr)
infile.close()
outfile.close()