Question

Notepad ++或python

如果文本文件1具有

，如何删除相同的行示例

 text123    
 text1234    
 text12345@    
 text12

和textfile2具有

text123   
text 00   
text 001   
text 12

输出为

text 00   
text 001

只需查找从textfile1到textfile2的重复行，然后输出为文本文件1中不存在的文本即可。

Answer 1

此解决方案避免将第二个文件的全部内容保留在内存中

with open('textfile1.txt', 'r') as f:
    bad_lines = set(f.readlines())

with open('textfile2.txt', 'r') as f:
    for line in f.readlines():
        if not line in bad_lines:
            print(line)

Answer 2

with open('file1.txt','r') as f:
    for l in f:
        txt1.append(l)
txt2 = []
with open('file2.txt','r') as f:
    for l in f:
        txt2.append(l)
ans = [line for line in txt2 if line not in txt1]
print(ans)

根据ethans评论进行更新：

with open('file1.txt','r') as f:
    txt1 = f.readlines()
txt2 = []
with open('file2.txt','r') as f:
    for l in f:
        if l not in txt1:
            txt2.append(l)
print(*txt2)

Answer 3

您可以使用set查找唯一条目：

with open(file1) as f1:
  for line in f1:
    list1.append(line)

with open(file2) as f2:
  for line in f2:
    list2.append(line)    

print('unique elemets in f1 and not in f2 = {}'.format(set(list1) - set(list2)))
print('unique elemets in f2 and not in f1 = {}'.format(set(list2) - set(list1)))

Answer 4

您也可以使用pandas：

import pandas as pd

df = df = pd.read_table(file1, names=['id'])
df1 = df = pd.read_table(file2, names=['id'])

df1[~df1.isin(df)].dropna()['id'].values.tolist()

['text 00', 'text 001']

Answer 5

with open(file1) as f1, open(file2) as f2:
    for f1_line, f2_line in zip(f1, f2):
        if f1_line != f2_line:
            print f2_line

例如一个完整的工作示例：

from io import StringIO

f1 = StringIO("""text123
text1234
text12345@
text12""")

f2 = StringIO("""text123
text 00
text 001
text 12""")

for f1_line, f2_line in zip(f1, f2):
    if f1_line != f2_line:
        print(f2_line, end='')

输出：

text 00
text 001
text 12

从2个文本文件中删除相同文本

5 个答案: