Python用逗号替换文本文件中的空格

时间:2012-02-03 21:03:43

标签: python

我的python相当生疏,想知道是否有更好的方法或更有效的方式来编写这个脚本。

脚本的目的是获取txt日志并将''替换为','替换'。'来创建.csv ..使日志更容易阅读。

任何建议或意见将不胜感激。 谢谢。

import sys
import os
import datetime

t = datetime.datetime.now() ## time set to UTC zone
ts = t.strftime("%Y_%m_%d_ %H_%M_%S") # Date format


if len(sys.argv) != 2: # if CLi does not equal to 2 commands print
print ("usage:progammename.py logname.ext")

sys.exit(1)

logSys = sys.argv[1] 
newLogSys = sys.argv[1] + "_" + ts +".csv"

log = open(logSys,"r")
nL = file(newLogSys  ,"w") 


# Read from file log and write to nLog file
for lineI in log.readlines():
    rec=lineI.rstrip()
    if rec.startswith("#"):
        lineI=rec.replace(':',',').strip() 
        nL.write(lineI + "\n")
    else:
        lineO=rec.replace(' ',',').strip() #
        nL.write(lineO + "\n") 

## closes both open files; End script
nL.close()
log.close()

=====Sample log========
#Date: 2008-04-18 15:41:16
#Fields: date time time-taken c-ip cs-username cs-auth-group x-exception-id sc-filter-result cs-categories cs(Referer) sc-status s-action cs-method rs(Content-Type) cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs-uri-extension cs(User-Agent) s-ip sc-bytes cs-bytes x-virus-id
2012-02-02 16:19:01 14 xxx.xxx.xxx.xxx user domain\group dns_unresolved_hostname DENIED "Games" -  404 TCP_ERR_MISS POST - http updaterservice.wildtangent.com 80 /appupdate/appcheckin.wss - wss "Mozilla/4.0 (compatible; MSIE 8.0; Win32)" xxx.xxx.xxx.xxx 824 697 -

4 个答案:

答案 0 :(得分:3)

  1. 请勿使用readlines进行迭代。只需for lineI in log将迭代所有行,但不会将整个文件读入内存。
  2. 您正在使用rstrip取消换行,但随后将其重新添加。
  3. strip的目的不明确,尤其是当你已经用逗号替换所有空格时。

答案 1 :(得分:2)

我会将您的代码缩短为:

import sys
import os
from time import strftime

if len(sys.argv) != 2: # if CLi does not equal to 2 commands print
    print ("usage:progammename.py logname.ext")
    sys.exit(1)

logSys    = sys.argv[1]
newLogSys = "%s_%s.csv" % (logSys,strftime("%Y_%m_%d_ %H_%M_%S"))

with open(logSys,'rb') as log, open(newLogSys,'wb') as nL:
    nL.writelines(lineI.replace(':' if lineI[0]=='#' else ' ', ',')
                  for lineI in log)

修改

我仍然不明白你的意思是添加另一行,即'\ n',而不是那些以'#'开头的行

我使用您的示例运行以下代码,但我没有观察到您所描述的内容。对不起,但我不能为我没有察觉的问题提出任何解决办法。

from time import strftime
import re

ss = ('--||  ||:|||:||--||| \r\n'
      '#10 23:30 abcdef : \r\n'
      '802 12:25 xyz  :  \r\n'
      '\r\n'
      '#:35 11:18+14:39 sunny vale : sunny sea\r\n'
      '  651454451 drh:hdb 54:1\r\n'
      '    \r\n'
      ': 541514 oi:npvert654165:8\r\n'
      '#5415:v541564zervt\r\n'
      '#     ::    \r\n'
      '#::: :::\r\n'
      ' E\r\n')

regx = re.compile('(\r?\n(?!$))|(\r?\n$)')

def smartdispl(com,smth,regx = regx):
    print '\n%s\n%s\n%s' %\
          ('{0:{fill}{align}70}'.format(' %s ' % com,fill='=',align='^'),
           '\n'.join(repr(el) for el in smth.splitlines(1)),
           '{0:{fill}{align}70}'.format('',fill='=',align='^'))

logSys = 'poiu.txt'

with open(logSys,'wb') as f:
    f.write(ss)

with open(logSys,'rb') as f:
    smartdispl('content of the file '+logSys,f.read())

newLogSys = "%s_%s.csv" % (logSys,strftime("%Y_%m_%d_ %H_%M_%S"))

with open(logSys,'rb') as log, open(newLogSys,'wb') as nL:
    nL.writelines(lineI.replace(':' if lineI[0]=='#' else ' ', ',')
                  for lineI in log)

with open(newLogSys,'rb') as f:
    smartdispl('content of the file '+newLogSys,f.read())

结果

==================== content of the file poiu.txt ====================
'--||  ||:|||:||--||| \r\n'
'#10 23:30 abcdef : \r\n'
'802 12:25 xyz  :  \r\n'
'\r\n'
'#:35 11:18+14:39 sunny vale : sunny sea\r\n'
'  651454451 drh:hdb 54:1\r\n'
'    \r\n'
': 541514 oi:npvert654165:8\r\n'
'#5415:v541564zervt\r\n'
'#     ::    \r\n'
'#::: :::\r\n'
' E\r\n'
======================================================================

======= content of the file poiu.txt_2012_02_07_ 00_48_55.csv ========
'--||,,||:|||:||--|||,\r\n'
'#10 23,30 abcdef , \r\n'
'802,12:25,xyz,,:,,\r\n'
'\r\n'
'#,35 11,18+14,39 sunny vale , sunny sea\r\n'
',,651454451,drh:hdb,54:1\r\n'
',,,,\r\n'
':,541514,oi:npvert654165:8\r\n'
'#5415,v541564zervt\r\n'
'#     ,,    \r\n'
'#,,, ,,,\r\n'
',E\r\n'
======================================================================

答案 2 :(得分:1)

使用@larsmans的建议并从写入部分删除代码重复:

# Read from file log and write to nLog file
for line in log:
    if line.startswith("#"): 
        line = line.replace(':',',')
    else: 
        line = line.replace(' ',',')
    nL.write(line) 

答案 3 :(得分:0)

如果你想要succintness,试试这个版本:

    for line in log:
        if line[0] == '#': line = ','.join(line.split(':'))
        else: line = ','.join(line.split())
        nL.write(line + '\n')