所以,我最近开始学习python并且在工作中我们想要一些方法来更容易地在我们的日志文件中找到特定关键字的过程,以便更容易分辨要添加到我们的阻止列表中的IP。
我决定编写一个python脚本来接收日志文件,接收一个包含关键术语列表的文件,然后在日志文件中查找这些关键术语,然后编写与会话匹配的行找到该关键词的ID;到一个新文件。
import sys
import time
import linecache
from datetime import datetime
def timeStamped(fname, fmt='%Y-%m-%d-%H-%M-%S_{fname}'):
return datetime.now().strftime(fmt).format(fname=fname)
importFile = open('rawLog.txt', 'r') #pulling in log file
importFile2 = open('keyWords.txt', 'r') #pulling in keywords
exportFile = open(timeStamped('ParsedLog.txt'), 'w') #writing the parsed log
FILE = importFile.readlines()
keyFILE = importFile2.readlines()
logLine = 1 #for debugging purposes when testing
parseString = ''
holderString = ''
sessionID = []
keyWords= []
j = 0
for line in keyFILE: #go through each line in the keyFile
keyWords = line.split(',') #add each word to the array
print(keyWords)#for debugging purposes when testing, this DOES give all the correct results
for line in FILE:
if keyWords[j] in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
elif importFile == '' and j < len(keyWords): #if importFile is at end of file and we are not at the end of the array
importFile.seek(0) #goes back to the start of the file
j+=1 #advance the keyWords array
logLine +=1 #for debugging purposes when testing
importFile2.close()
print(sessionID) #for debugging purposes when testing
importFile.seek(0) #goes back to the start of the file
i = 0
for line in FILE:
if sessionID[i] in line[29:35]: #checking if the sessionID matches (doing it this way since I ran into issues where some sessionIDs matched parts of the log file that were not sessionIDs
holderString = line #pulling the line of log file
exportFile.write(holderString)#writing the log file line to a new text file
print(holderString) #for debugging purposes when testing
if i < len(sessionID):
i+=1
importFile.close()
exportFile.close()
它没有遍及我的关键词列表,我可能犯了一些愚蠢的菜鸟错误,但我没有足够的经验来实现我搞砸了。当我检查输出时,它只搜索rawLog.txt文件中keyWords列表中的第一项。
第三个循环确实返回基于第二个列表提取并尝试迭代的sessionID出现的结果(由于我从不小于sessionID列表的长度,这给出了一个超出范围的异常,由于sessionID只有1个值。)
程序会成功写入并命名新的日志文件,使用DateTime后跟ParsedLog.txt。
答案 0 :(得分:2)
如果elif永远不是True,那么您永远不会增加j
,因此您需要始终增加或检查elif
语句是否实际上正在评估True
for line in FILE:
if keyWords[j] in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
elif importFile == '' and j < len(keyWords): #if importFile is at end of file and we are not at the end of the array
importFile.seek(0) #goes back to the start of the file
j+=1 # always increase
查看上面的循环,您可以在代码中使用importFile = open('rawLog.txt', 'r')
创建文件对象,因此elif importFile == ''
永远不会True
,因为importFile
不是文件对象一个字符串。
您指定FILE = importFile.readlines()
,这样就会耗尽创建FILE列表的迭代器,importFile.seek(0)
,但实际上并没有再次使用该文件对象。
所以基本上你循环一次FILE
,j
永远不会增加,然后你的代码会移到下一个区块。
你真正需要的是嵌套循环,使用any
查看keyWords中是否有任何单词出现在每行中并忘记你的elif:
for line in FILE:
if any(word in line for word in keyWords):
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
同样的逻辑适用于你的下一个循环:
for line in FILE:
if any(sess in line[29:35] for sess in sessionID ): #checking if the sessionID matches (doing it this way since I ran into issues where some sessionIDs matched parts of the log file that were not sessionIDs
exportFile.write(line)#writing the log file line to a new text file
holderString = line
没有任何内容会引用相同的对象行,因此您只需exportFile.write(line)
并忘记分配。
在旁注中使用小写和下划线表示变量等。holderString -> holder_string
并使用with
打开文件最好,因为它也会关闭它们。
with open('rawLog.txt') as import_file:
log_lines = import_file.readlines()
我还将FILE
更改为log_lines
,使用更具描述性的名称使您的代码更容易理解。
答案 1 :(得分:2)
在我看来,你的第二个循环需要一个内部循环而不是内部的if语句。 E.g。
for line in FILE:
for word in keyWords:
if word in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
break # Assuming there will only be one keyword per line, else remove this
logLine +=1 #for debugging purposes when testing
importFile2.close()
print(sessionID) #for debugging purposes when testing
假设我已经理解正确,那就是。