我有几个日志文件,大多数都超过100万行。 我不想删除每个文件的前三行以及第四行的前9行。
我可以删除前3行,但是,我还无法弄清楚如何删除第4行的前9个字符并保留文档的其余部分。
示例数据:
#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
#Date: 2015-06-02 00:00:00
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs- username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken
期望的输出:
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken
我到目前为止的代码:
for filename in os.listdir(path):
basename, ext = os.path.splitext(filename)
fullname = os.path.join(path, filename)
newname = os.path.join(path, basename + '-out' + ext)
with open(fullname) as read:
#skip first 3 lines
for n in xrange(3):
read.readline()
# hand the rest to shutil.copyfileobj
with open(newname, 'w') as write:
shutil.copyfileobj(read, write)
答案 0 :(得分:1)
你非常接近:
for filename in os.listdir(path):
basename, ext = os.path.splitext(filename)
fullname = os.path.join(path, filename)
newname = os.path.join(path, basename + '-out' + ext)
with open(fullname) as read:
#skip first 3 lines
for n in xrange(3):
read.readline()
# consume 9 bytes <<<<<< ADDED THIS <<<<<
read.read(9) # <<<<<< ADDED THIS <<<<<
# hand the rest to shutil.copyfileobj
with open(newname, 'w') as write:
shutil.copyfileobj(read, write)
答案 1 :(得分:0)
你有99%的路在那里。其余的是在复制之前将读指针前进9个字符。
#skip first 3 lines
for n in xrange(3):
read.readline()
# Skip 9 characters
read.read(9)
# hand the rest to shutil.copyfileobj
with open(newname, 'w') as write:
shutil.copyfileobj(read, write)
答案 2 :(得分:0)
感谢您提供的信息......虽然我无法获得read.read()选项,但是关于向前移动读取指针的注释却指向了正确的方向。
我选择了将指针位置提前108,然后读取文件。
有效的最终代码:
for filename in os.listdir(path):
basename, ext = os.path.splitext(filename)
fullname = os.path.join(path, filename)
newname = os.path.join(path, basename + '-out' + ext)
with open(fullname) as read:
#skip first two lines
read.seek(108)
for n in xrange(0):
read.readline()
# hand the rest to shutil.copyfileobj
with open(newname, 'w') as write:
shutil.copyfileobj(read, write)