在url中查找python中的模式

时间:2014-11-06 01:56:43

标签: python regex

我有一个缓存日志文件,我必须在其中删除网址中包含“.js?”的行。 “.gif?”,“。png?”必须被淘汰。

logfile=open('/home/prasanna/Downloads/processed_file','r')
cleanfile=open('/home/prasanna/Downloads/cleaned_file','a')
with logfile:
    for line in logfile:
         line_words=line.split()
         url=line_words[6].split('.')
         #pattern if_condition
              cleanfile.write(line)
cleanfile.close()
logfile.close()

每当processed_file中的一行没有上述模式时,我需要将processed_file中的行写入已清理的文件

例如: 1168414758.369 723 80.126.67.6 TCP_MISS / 304 380 GET http://c.msn.com/c.gif?[07lKw.F:jbQg5CY03lJ8T.] - DIRECT / 207.46.216.62 -

1168416013.376 621 233.7.37.201 TCP_MISS / 304 162 GET http://mobile9.us.intellitxt.com/v3/func_033.js?[15zZlncWMGXv5PQNupu.tC] - DIRECT / 205.147.84.25 -

2 个答案:

答案 0 :(得分:1)

如果删除你的意思是不将行写入已清理的文件,那么应该进行简单的检查。

logfile=open('/home/prasanna/Downloads/processed_file','r')
cleanfile=open('/home/prasanna/Downloads/cleaned_file','a')
with logfile:
    for line in logfile:
         line_words=line.split()
         url=line_words[6].split('.')
         if "gif?" not in line and ".png?" not in line and ".js?" not in line:
              cleanfile.write(line)
cleanfile.close()
logfile.close()

答案 1 :(得分:0)

难道这么难吗?

for line in file:
    if ".gif" in line or ".png" in line or ".js" in line:
        line = ""
    else:
        pass