我有一个包含错误信息的非常大的文件。
我写了一个剧本。每当遇到 xxx 时:
这是脚本:
subject='problematic.txt'
pattern='xxx'
subject2='resolved.txt'
output = open(subject2, 'w')
line1='something'
line2='stg'
with open(subject) as myFile:
for num, line in enumerate(myFile, 1): #to get the line number
if pattern in line:
print 'found at line:', num
line = line1 #replace the line containing xxx with 'something'
output.write(line)
line = next(myFile, "") # move to the next line
line = line2 #replace the next line with 'stg'
output.write(line)
else:
output.write(line) # save as is
output.close()
myFile.close()
它适用于第一个 xxx 出现,但不适用于子结果。原因是next()
向前移动迭代,因此我的脚本在错误的位置进行更改。
这是输出:
在第3行
找到在第6行
找到而不是:
在第3行
找到在第7行
找到因此,更改不会在写入位置进行...理想情况下,在使用line2更改行后取消next()
将解决我的问题,但我没有找到previous()函数。任何人?谢谢!
答案 0 :(得分:2)
您当前的代码几乎可以使用。我相信它正确识别并过滤掉输入文件的正确行,但它会错误地报告找到匹配的行号,因为enumerate
生成器没有看到跳过的行。
虽然您可以像其他答案所建议的那样以各种方式重写它,但您不需要进行重大更改(除非您希望,出于其他设计原因)。这是新评论所指出的最小变化的代码:
with open(subject) as myFile:
gen = enumerate(myFile, 1) # save the enumerate generator to a variable
for num, line in gen: # iterate over it, as before
if pattern in line:
print 'found at line:', num
line = line1
output.write(line)
next(gen, None) # advance the generator and throw away the results
line = line2
output.write(line)
else:
output.write(line)
答案 1 :(得分:1)
当您认为需要展望未来时,回顾问题几乎总是更容易重述问题。在这种情况下,只需跟踪上一行并查看 以查看它是否与目标字符串匹配。
infilename = "problematic.txt"
outfilename = "resolved.txt"
pattern = "xxx"
replace1 = "something"
replace2 = "stg"
with open(infilename) as infile:
with open(outfilename, "w") as outfile:
previous = ""
for linenum, current in enumerate(infile):
if pattern in previous:
print "found at line", linenum
previous, current = replace1, replace2
if linenum: # skip the first (blank) previous line
outfile.write(previous)
previous = current
outfile.write(previous) # write the final line
答案 2 :(得分:0)
你可以用这种方式拉链以同时获得两个指针:
with open(subject) as myFile:
lines = myFile.readlines()
for current, next in zip(lines, lines[1:])
...
编辑:这只是为了演示压缩行的想法,对于大文件使用iter(myFile),意思是:
with open(subject) as myFile:
it1 = myFile
myFile.next()
for current, next in zip(it1,myFile):
...
请注意,该文件是可迭代的,无需添加任何额外的包装
答案 3 :(得分:0)
这似乎与要替换的字符串同时出现在奇数和偶数行上:
with open ('test.txt', 'r') as f:
for line in f:
line = line.strip ()
if line == 'apples': #to be replaced
print ('manzanas') #replacement 1
print ('y más manzanas') #replacement 2
next (f)
continue
print (line)
示例输入:
apples
pears
apples
pears
pears
apples
pears
pears
示例输出:
manzanas
y más manzanas
manzanas
y más manzanas
pears
manzanas
y más manzanas
pears
答案 4 :(得分:0)
没有previous
函数,因为这不是迭代器协议的工作方式。特别是对于生成器,“先前”元素的概念甚至可能不存在。
相反,您希望使用两个游标迭代文件,zip
将它们一起ping:
from itertools import tee
with open(subject) as f:
its = tee(f)
next(its[1]) # advance the second iterator to first line
for first,second in zip(*its): # in python 2, use itertools.izip
#do something to first and/or second, comparing them appropriately
以上就像执行for line in f:
一样,除了你现在first
中的第一行和second
中紧跟其后的行。
答案 5 :(得分:0)
我只想设置一个标志,表示你想跳过下一行,并在循环中检查它而不是使用next
:
with open(foo) as myFile:
skip = False
for line in myFile:
if skip:
skip = False
continue
if pattern in line:
output.write("something")
output.write("stg")
skip = True
else:
output.write(line)
答案 6 :(得分:0)
您需要以某种方式缓冲线条。对于单行来说,这很容易做到:
class Lines(object):
def __init__(self, f):
self.f = f # file object
self.prev = None # previous line
def next(self):
if not self.prev:
try:
self.prev = next(self.f)
except StopIteration:
return
return self.prev
def consume(self):
if self.prev is not None:
self.prev = next(self.f)
现在您需要调用Lines.next()
来获取下一行,并Lines.consume()
来使用它。一条线保持缓冲,直到它被消耗:
>>> f = open("table.py")
>>> lines = Lines(f)
>>> lines.next()
'import itertools\n'
>>> lines.next() # same line
'import itertools\n'
>>> lines.consume() # remove the current buffered line
>>> lines.next()
'\n' # next line