我有以下输入文件结构,每行都有文字:
line1
line2
line3
line3
line4
line5
line6
当两条线完全相同,即第3行时,我想保留第二条线,并将第一条线的内容更改为" SECTION MISSING"。我无法将它放在正确的位置。我得到的最接近的是下面的代码,但我得到的输出是:
line1
line2
line3
SECTION MISSING
line4
etc.
虽然我想:
line1
line2
SECTION MISSING
line3
line4
代码:
def uniq(iterator):
previous = float("NaN") # Not equal to anything
section=("SECTION : MISSING\n")
for value in iterator:
if previous == value:
yield section
else:
yield value
previous = value
return;
with open('infile.txt','r') as file:
with open('outfile.txt','w') as f:
for line in uniq(file):
f.write(line)
答案 0 :(得分:5)
我认为您想要获得previous
,而不是value
:
def uniq(iterator):
previous = None
section = ("SECTION : MISSING\n")
for value in iterator:
if previous == value:
yield section
elif previous is not None:
yield previous
previous = value
if previous is not None:
yield previous
使用示例:
>>> list(uniq([1, 2, 2, 3, 4, 5, 6, 6]))
[1, 'SECTION : MISSING\n', 2, 3, 4, 5, 'SECTION : MISSING\n', 6]
答案 1 :(得分:2)
类似的东西:
prev = None
with open('infile.txt','r') as fi:
with open('outfile.txt','w') as fo:
for line in fi:
if prev is not None:
fo.write(prev if prev != line else "SECTION : MISSING\n")
prev = line
fo.write(prev)
将为您提供您正在寻找的输出文件:
line1 line2 SECTION : MISSING line3 line4 line5 line6
答案 2 :(得分:0)
对于像这样的任务的个人偏好,我使用两个游标而不是一个:
from itertools import tee, izip
with open(infile) as r, open(outfile, 'w') as w:
p, c = tee(r)
w.write(next(c))
for prev,cur in izip(p,c):
w.write(cur if prev!=cur else 'SECTION : MISSING\n')
答案 3 :(得分:0)
如果您必须使用三个连续行(两个或更多个)来处理这种情况,您只想替换第一个,您可以使用groupby
:< / p>
from itertools import groupby, islice, chain
def detect_missing(source):
grouped = groupby(source)
section = "SECTION: MISSING\n"
for _, group in grouped:
first_two = list(islice(group, 2))
if len(first_two) > 1:
first_two[0] = section
yield from chain(first_two, group)
(Python 3,但如果需要,可以删除yield from
。)