从2个文本文件中删除相同文本

时间:2019-01-11 23:43:04

标签: python notepad++

Notepad ++或python

如果文本文件1具有

,如何删除相同的行示例
 text123    
 text1234    
 text12345@    
 text12

和textfile2具有

text123   
text 00   
text 001   
text 12  

输出为

text 00   
text 001

只需查找从textfile1到textfile2的重复行,然后输出为文本文件1中不存在的文本即可。

5 个答案:

答案 0 :(得分:3)

此解决方案避免将第二个文件的全部内容保留在内存中

with open('textfile1.txt', 'r') as f:
    bad_lines = set(f.readlines())

with open('textfile2.txt', 'r') as f:
    for line in f.readlines():
        if not line in bad_lines:
            print(line)

答案 1 :(得分:1)

with open('file1.txt','r') as f:
    for l in f:
        txt1.append(l)
txt2 = []
with open('file2.txt','r') as f:
    for l in f:
        txt2.append(l)
ans = [line for line in txt2 if line not in txt1]
print(ans)

根据ethans评论进行更新:

with open('file1.txt','r') as f:
    txt1 = f.readlines()
txt2 = []
with open('file2.txt','r') as f:
    for l in f:
        if l not in txt1:
            txt2.append(l)
print(*txt2)

答案 2 :(得分:0)

您可以使用set查找唯一条目:

with open(file1) as f1:
  for line in f1:
    list1.append(line)

with open(file2) as f2:
  for line in f2:
    list2.append(line)    

print('unique elemets in f1 and not in f2 = {}'.format(set(list1) - set(list2)))
print('unique elemets in f2 and not in f1 = {}'.format(set(list2) - set(list1)))

答案 3 :(得分:0)

您也可以使用pandas

import pandas as pd

df = df = pd.read_table(file1, names=['id'])
df1 = df = pd.read_table(file2, names=['id'])

df1[~df1.isin(df)].dropna()['id'].values.tolist()

['text 00', 'text 001']

答案 4 :(得分:0)

with open(file1) as f1, open(file2) as f2:
    for f1_line, f2_line in zip(f1, f2):
        if f1_line != f2_line:
            print f2_line

例如一个完整的工作示例:

from io import StringIO

f1 = StringIO("""text123
text1234
text12345@
text12""")

f2 = StringIO("""text123
text 00
text 001
text 12""")

for f1_line, f2_line in zip(f1, f2):
    if f1_line != f2_line:
        print(f2_line, end='')

输出:

text 00
text 001
text 12