我在Python中编写了一段代码,将现有的文本文件(.txt)复制到同一位置的新文件(名称不同)。这会按预期复制原始文本文件中的所有文本:
a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
for reform1 in a.readlines():
b.write(reform1) #write the lines from 'reform1'
reform1=a.readlines() #read the lines in the file
a.close() #close file a (file1)
b.close() #close file b (file2)
我现在被要求修改新文件,删除复制过的文件中的重复行和空白行(同时保留原文)并保留原文的其余部分(唯一行)。怎么做?
答案 0 :(得分:2)
除'file2.txt'
之外的所有行(仅仅由空白或重复的行)写入'file1.txt'
。保留顺序,但假设只有第一个实例应该写入副本:
seen = set()
with open('file1.txt') as f, open('file2.txt','w') as o:
for line in f:
if not line.isspace() and not line in seen:
o.write(line)
seen.add(line)
注意str.isspace()
对于所有空格(例如制表符)都是True
,而不仅仅是换行符,使用if not line == '\n'
进行更严格的定义(假设没有'/r'
个换行符)。
我使用with
语句处理文件的打开/关闭,并逐行读取文件,这是最pythonic的方式。
如果只是用Python复制文件,你应该按照here所解释的那样使用shutil。
答案 1 :(得分:1)
试试这个:
import re
a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
exists = set()
for reform1 in a.readlines():
if reform1 in exists:
continue
elif re.match(r'^\s$', reform1):
continue
else:
b.write(reform1) #write the lines from 'reform1'
exists.add(reform1)
a.close() #close file a (file1)
b.close() #close file b (file2)
答案 2 :(得分:0)
尝试:
a=open("file1.txt", "r") #existing file
b=open("file2.txt", "w") #file did not previously exist, hence "w"
seen = []
for reform1 in a.readlines():
if reform1 not in seen and len(reform1) > 1:
b.write(reform1) #write the lines from 'reform1'
seen.append(reform1)
a.close() #close file a (file1)
b.close() #close file b (file2)
我使用“len(reform1)> 1”,因为当我创建我的测试文件时,空行有1个字符,大概是“\ r”或者可能是“\ n”字符。根据您的应用需要进行调整。