Question

我在Python中编写了一段代码，将现有的文本文件（.txt）复制到同一位置的新文件（名称不同）。这会按预期复制原始文本文件中的所有文本：

a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
for reform1 in a.readlines():
    b.write(reform1) #write the lines from 'reform1'
    reform1=a.readlines() #read the lines in the file
a.close() #close file a (file1)
b.close() #close file b (file2)

我现在被要求修改新文件，删除复制过的文件中的重复行和空白行（同时保留原文）并保留原文的其余部分（唯一行）。怎么做？

Answer 1

除'file2.txt'之外的所有行（仅仅由空白或重复的行）写入'file1.txt'。保留顺序，但假设只有第一个实例应该写入副本：

seen = set()
with open('file1.txt') as f, open('file2.txt','w') as o:
    for line in f:
        if not line.isspace() and not line in seen:
            o.write(line)
            seen.add(line)

注意str.isspace()对于所有空格（例如制表符）都是True，而不仅仅是换行符，使用if not line == '\n'进行更严格的定义（假设没有'/r'个换行符）。

我使用with语句处理文件的打开/关闭，并逐行读取文件，这是最pythonic的方式。

如果只是用Python复制文件，你应该按照here所解释的那样使用shutil。

Answer 2

试试这个：

import re
a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
exists = set()
for reform1 in a.readlines():
    if reform1 in exists:
        continue
    elif re.match(r'^\s$', reform1):
        continue
    else:
        b.write(reform1) #write the lines from 'reform1'
        exists.add(reform1)
a.close() #close file a (file1)
b.close() #close file b (file2)

Answer 3

尝试：

a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
seen = []
for reform1 in a.readlines():
    if reform1 not in seen and len(reform1) > 1:
        b.write(reform1) #write the lines from 'reform1'
        seen.append(reform1)
a.close() #close file a (file1)
b.close() #close file b (file2)

我使用“len（reform1）＆gt; 1”，因为当我创建我的测试文件时，空行有1个字符，大概是“\ r”或者可能是“\ n”字符。根据您的应用需要进行调整。

在Python

3 个答案: