我有一个缓存日志文件,我必须在其中删除网址中包含“.js?”的行。 “.gif?”,“。png?”必须被淘汰。
logfile=open('/home/prasanna/Downloads/processed_file','r')
cleanfile=open('/home/prasanna/Downloads/cleaned_file','a')
with logfile:
for line in logfile:
line_words=line.split()
url=line_words[6].split('.')
#pattern if_condition
cleanfile.write(line)
cleanfile.close()
logfile.close()
每当processed_file中的一行没有上述模式时,我需要将processed_file中的行写入已清理的文件
例如: 1168414758.369 723 80.126.67.6 TCP_MISS / 304 380 GET http://c.msn.com/c.gif?[07lKw.F:jbQg5CY03lJ8T.] - DIRECT / 207.46.216.62 -
1168416013.376 621 233.7.37.201 TCP_MISS / 304 162 GET http://mobile9.us.intellitxt.com/v3/func_033.js?[15zZlncWMGXv5PQNupu.tC] - DIRECT / 205.147.84.25 -
答案 0 :(得分:1)
如果删除你的意思是不将行写入已清理的文件,那么应该进行简单的检查。
logfile=open('/home/prasanna/Downloads/processed_file','r')
cleanfile=open('/home/prasanna/Downloads/cleaned_file','a')
with logfile:
for line in logfile:
line_words=line.split()
url=line_words[6].split('.')
if "gif?" not in line and ".png?" not in line and ".js?" not in line:
cleanfile.write(line)
cleanfile.close()
logfile.close()
答案 1 :(得分:0)
难道这么难吗?
for line in file:
if ".gif" in line or ".png" in line or ".js" in line:
line = ""
else:
pass