在Python

时间:2016-11-30 16:42:51

标签: python

我在Python中编写了一段代码,将现有的文本文件(.txt)复制到同一位置的新文件(名称不同)。这会按预期复制原始文本文件中的所有文本:

a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
for reform1 in a.readlines():
    b.write(reform1) #write the lines from 'reform1'
    reform1=a.readlines() #read the lines in the file
a.close() #close file a (file1)
b.close() #close file b (file2)

我现在被要求修改新文件,删除复制过的文件中的重复行和空白行(同时保留原文)并保留原文的其余部分(唯一行)。怎么做?

3 个答案:

答案 0 :(得分:2)

'file2.txt'之外的所有行(仅仅由空白或重复的行)写入'file1.txt'。保留顺序,但假设只有第一个实例应该写入副本:

seen = set()
with open('file1.txt') as f, open('file2.txt','w') as o:
    for line in f:
        if not line.isspace() and not line in seen:
            o.write(line)
            seen.add(line)

注意str.isspace()对于所有空格(例如制表符)都是True,而不仅仅是换行符,使用if not line == '\n'进行更严格的定义(假设没有'/r'个换行符)。

我使用with语句处理文件的打开/关闭,并逐行读取文件,这是最pythonic的方式。

如果只是用Python复制文件,你应该按照here所解释的那样使用shutil。

答案 1 :(得分:1)

试试这个:

import re
a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
exists = set()
for reform1 in a.readlines():
    if reform1 in exists:
        continue
    elif re.match(r'^\s$', reform1):
        continue
    else:
        b.write(reform1) #write the lines from 'reform1'
        exists.add(reform1)
a.close() #close file a (file1)
b.close() #close file b (file2)

答案 2 :(得分:0)

尝试:

a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
seen = []
for reform1 in a.readlines():
    if reform1 not in seen and len(reform1) > 1:
        b.write(reform1) #write the lines from 'reform1'
        seen.append(reform1)
a.close() #close file a (file1)
b.close() #close file b (file2)

我使用“len(reform1)> 1”,因为当我创建我的测试文件时,空行有1个字符,大概是“\ r”或者可能是“\ n”字符。根据您的应用需要进行调整。