在python脚本中取消next()函数

时间:2014-01-29 20:38:41

标签: python function iteration next

我有一个包含错误信息的非常大的文件。

  • 这个
  • xxx 123gt少1121
  • 12345 fre 233fre
  • 有问题的档案。
  • 它包含
  • xxx hy 456 efe
  • rtg 1215687 fwe
  • 很多错误
  • 我想要
  • 忘记了

我写了一个剧本。每当遇到 xxx 时:

  1. 该行将替换为自定义字符串(某些内容)。
  2. 下一行替换为另一个自定义字符串( stg )。
  3. 这是脚本:

    subject='problematic.txt'
    pattern='xxx'
    subject2='resolved.txt'
    output = open(subject2, 'w')
    line1='something'
    line2='stg'
    
    
    with open(subject) as myFile:
        for num, line in enumerate(myFile, 1): #to get the line number
            if pattern in line:
                print 'found at line:', num
                line = line1 #replace the line containing xxx with 'something'
                output.write(line)
                line = next(myFile, "") # move to the next line
                line = line2 #replace the next line with 'stg'
                output.write(line)
            else:
                output.write(line) # save as is
    output.close()
    myFile.close()
    

    它适用于第一个 xxx 出现,但不适用于子结果。原因是next()向前移动迭代,因此我的脚本在错误的位置进行更改。

    这是输出:

    在第3行

    找到

    在第6行

    找到

    而不是:

    在第3行

    找到

    在第7行

    找到

    因此,更改不会在写入位置进行...理想情况下,在使用line2更改行后取消next()将解决我的问题,但我没有找到previous()函数。任何人?谢谢!

7 个答案:

答案 0 :(得分:2)

您当前的代码几乎可以使用。我相信它正确识别并过滤掉输入文件的正确行,但它会错误地报告找到匹配的行号,因为enumerate生成器没有看到跳过的行。

虽然您可以像其他答案所建议的那样以各种方式重写它,但您不需要进行重大更改(除非您希望,出于其他设计原因)。这是新评论所指出的最小变化的代码:

with open(subject) as myFile:
    gen = enumerate(myFile, 1)  # save the enumerate generator to a variable
    for num, line in gen:       # iterate over it, as before
        if pattern in line:
            print 'found at line:', num
            line = line1
            output.write(line)
            next(gen, None)     # advance the generator and throw away the results
            line = line2
            output.write(line)
        else:
            output.write(line)

答案 1 :(得分:1)

当您认为需要展望未来时,回顾问题几乎总是更容易重述问题。在这种情况下,只需跟踪上一行并查看 以查看它是否与目标字符串匹配。

infilename  = "problematic.txt"
outfilename = "resolved.txt"

pattern  = "xxx"
replace1 = "something"
replace2 = "stg"

with open(infilename) as infile:
    with open(outfilename, "w") as outfile:

        previous = ""

        for linenum, current in enumerate(infile):
            if pattern in previous:
                print "found at line", linenum
                previous, current = replace1, replace2
            if linenum:           # skip the first (blank) previous line
                outfile.write(previous)
            previous = current

        outfile.write(previous)    # write the final line

答案 2 :(得分:0)

你可以用这种方式拉链以同时获得两个指针:

with open(subject) as myFile:
    lines = myFile.readlines()
    for current, next in zip(lines, lines[1:])
         ...

编辑:这只是为了演示压缩行的想法,对于大文件使用iter(myFile),意思是:

with open(subject) as myFile:
    it1 = myFile
    myFile.next()
    for current, next in zip(it1,myFile):
        ...

请注意,该文件是可迭代的,无需添加任何额外的包装

答案 3 :(得分:0)

这似乎与要替换的字符串同时出现在奇数和偶数行上:

with open ('test.txt', 'r') as f:
    for line in f:
        line = line.strip ()
        if line == 'apples': #to be replaced
            print ('manzanas') #replacement 1
            print ('y más manzanas') #replacement 2
            next (f)
            continue
        print (line)

示例输入:

apples
pears
apples
pears
pears
apples
pears
pears

示例输出:

manzanas
y más manzanas
manzanas
y más manzanas
pears
manzanas
y más manzanas
pears

答案 4 :(得分:0)

没有previous函数,因为这不是迭代器协议的工作方式。特别是对于生成器,“先前”元素的概念甚至可能不存在。

相反,您希望使用两个游标迭代文件,zip将它们一起ping:

from itertools import tee

with open(subject) as f:
    its = tee(f) 
    next(its[1]) # advance the second iterator to first line
    for first,second in zip(*its): # in python 2, use itertools.izip
        #do something to first and/or second, comparing them appropriately

以上就像执行for line in f:一样,除了你现在first中的第一行和second中紧跟其后的行。

答案 5 :(得分:0)

我只想设置一个标志,表示你想跳过下一行,并在循环中检查它而不是使用next

with open(foo) as myFile: 
  skip = False
  for line in myFile:
    if skip:
      skip = False
      continue
    if pattern in line:
      output.write("something")
      output.write("stg")
      skip = True
    else:
      output.write(line)        

答案 6 :(得分:0)

您需要以某种方式缓冲线条。对于单行来说,这很容易做到:

class Lines(object):

    def __init__(self, f):
        self.f = f        # file object
        self.prev = None  # previous line

    def next(self):
        if not self.prev:
            try:
                self.prev = next(self.f)
            except StopIteration:
                return
        return self.prev

    def consume(self):
        if self.prev is not None:
        self.prev = next(self.f)

现在您需要调用Lines.next()来获取下一行,并Lines.consume()来使用它。一条线保持缓冲,直到它被消耗:

>>> f = open("table.py")
>>> lines = Lines(f)
>>> lines.next()
'import itertools\n'
>>> lines.next()      # same line
'import itertools\n'
>>> lines.consume()   # remove the current buffered line
>>> lines.next()
'\n'                  # next line