Question

我有一个包含错误信息的非常大的文件。

这个
是
xxx 123gt少1121
12345 fre 233fre
有问题的档案。
它包含
xxx hy 456 efe
rtg 1215687 fwe
很多错误
我想要
忘记了

我写了一个剧本。每当遇到 xxx 时：

该行将替换为自定义字符串（某些内容）。
下一行替换为另一个自定义字符串（ stg ）。

这是脚本：

subject='problematic.txt'
pattern='xxx'
subject2='resolved.txt'
output = open(subject2, 'w')
line1='something'
line2='stg'


with open(subject) as myFile:
    for num, line in enumerate(myFile, 1): #to get the line number
        if pattern in line:
            print 'found at line:', num
            line = line1 #replace the line containing xxx with 'something'
            output.write(line)
            line = next(myFile, "") # move to the next line
            line = line2 #replace the next line with 'stg'
            output.write(line)
        else:
            output.write(line) # save as is
output.close()
myFile.close()

它适用于第一个 xxx 出现，但不适用于子结果。原因是next()向前移动迭代，因此我的脚本在错误的位置进行更改。

这是输出：

在第3行

在第6行

而不是：

在第3行

在第7行

因此，更改不会在写入位置进行...理想情况下，在使用line2更改行后取消next()将解决我的问题，但我没有找到previous（）函数。任何人？谢谢！

Answer 1

您当前的代码几乎可以使用。我相信它正确识别并过滤掉输入文件的正确行，但它会错误地报告找到匹配的行号，因为enumerate生成器没有看到跳过的行。

虽然您可以像其他答案所建议的那样以各种方式重写它，但您不需要进行重大更改（除非您希望，出于其他设计原因）。这是新评论所指出的最小变化的代码：

with open(subject) as myFile:
    gen = enumerate(myFile, 1)  # save the enumerate generator to a variable
    for num, line in gen:       # iterate over it, as before
        if pattern in line:
            print 'found at line:', num
            line = line1
            output.write(line)
            next(gen, None)     # advance the generator and throw away the results
            line = line2
            output.write(line)
        else:
            output.write(line)

Answer 2

当您认为需要展望未来时，回顾问题几乎总是更容易重述问题。在这种情况下，只需跟踪上一行并查看以查看它是否与目标字符串匹配。

infilename  = "problematic.txt"
outfilename = "resolved.txt"

pattern  = "xxx"
replace1 = "something"
replace2 = "stg"

with open(infilename) as infile:
    with open(outfilename, "w") as outfile:

        previous = ""

        for linenum, current in enumerate(infile):
            if pattern in previous:
                print "found at line", linenum
                previous, current = replace1, replace2
            if linenum:           # skip the first (blank) previous line
                outfile.write(previous)
            previous = current

        outfile.write(previous)    # write the final line

Answer 3

你可以用这种方式拉链以同时获得两个指针：

with open(subject) as myFile:
    lines = myFile.readlines()
    for current, next in zip(lines, lines[1:])
         ...

编辑：这只是为了演示压缩行的想法，对于大文件使用iter（myFile），意思是：

with open(subject) as myFile:
    it1 = myFile
    myFile.next()
    for current, next in zip(it1,myFile):
        ...

请注意，该文件是可迭代的，无需添加任何额外的包装

Answer 4

这似乎与要替换的字符串同时出现在奇数和偶数行上：

with open ('test.txt', 'r') as f:
    for line in f:
        line = line.strip ()
        if line == 'apples': #to be replaced
            print ('manzanas') #replacement 1
            print ('y más manzanas') #replacement 2
            next (f)
            continue
        print (line)

示例输入：

apples
pears
apples
pears
pears
apples
pears
pears

示例输出：

manzanas
y más manzanas
manzanas
y más manzanas
pears
manzanas
y más manzanas
pears

Answer 5

没有previous函数，因为这不是迭代器协议的工作方式。特别是对于生成器，“先前”元素的概念甚至可能不存在。

相反，您希望使用两个游标迭代文件，zip将它们一起ping：

from itertools import tee

with open(subject) as f:
    its = tee(f) 
    next(its[1]) # advance the second iterator to first line
    for first,second in zip(*its): # in python 2, use itertools.izip
        #do something to first and/or second, comparing them appropriately

以上就像执行for line in f:一样，除了你现在first中的第一行和second中紧跟其后的行。

Answer 6

我只想设置一个标志，表示你想跳过下一行，并在循环中检查它而不是使用next：

with open(foo) as myFile: 
  skip = False
  for line in myFile:
    if skip:
      skip = False
      continue
    if pattern in line:
      output.write("something")
      output.write("stg")
      skip = True
    else:
      output.write(line)

Answer 7

您需要以某种方式缓冲线条。对于单行来说，这很容易做到：

class Lines(object):

    def __init__(self, f):
        self.f = f        # file object
        self.prev = None  # previous line

    def next(self):
        if not self.prev:
            try:
                self.prev = next(self.f)
            except StopIteration:
                return
        return self.prev

    def consume(self):
        if self.prev is not None:
        self.prev = next(self.f)

现在您需要调用Lines.next()来获取下一行，并Lines.consume()来使用它。一条线保持缓冲，直到它被消耗：

>>> f = open("table.py")
>>> lines = Lines(f)
>>> lines.next()
'import itertools\n'
>>> lines.next()      # same line
'import itertools\n'
>>> lines.consume()   # remove the current buffered line
>>> lines.next()
'\n'                  # next line

在python脚本中取消next（）函数

7 个答案: