Notepad ++或python
如果文本文件1具有
,如何删除相同的行示例 text123
text1234
text12345@
text12
和textfile2具有
text123
text 00
text 001
text 12
输出为
text 00
text 001
只需查找从textfile1到textfile2的重复行,然后输出为文本文件1中不存在的文本即可。
答案 0 :(得分:3)
此解决方案避免将第二个文件的全部内容保留在内存中
with open('textfile1.txt', 'r') as f:
bad_lines = set(f.readlines())
with open('textfile2.txt', 'r') as f:
for line in f.readlines():
if not line in bad_lines:
print(line)
答案 1 :(得分:1)
with open('file1.txt','r') as f:
for l in f:
txt1.append(l)
txt2 = []
with open('file2.txt','r') as f:
for l in f:
txt2.append(l)
ans = [line for line in txt2 if line not in txt1]
print(ans)
根据ethans评论进行更新:
with open('file1.txt','r') as f:
txt1 = f.readlines()
txt2 = []
with open('file2.txt','r') as f:
for l in f:
if l not in txt1:
txt2.append(l)
print(*txt2)
答案 2 :(得分:0)
您可以使用set
查找唯一条目:
with open(file1) as f1:
for line in f1:
list1.append(line)
with open(file2) as f2:
for line in f2:
list2.append(line)
print('unique elemets in f1 and not in f2 = {}'.format(set(list1) - set(list2)))
print('unique elemets in f2 and not in f1 = {}'.format(set(list2) - set(list1)))
答案 3 :(得分:0)
您也可以使用pandas
:
import pandas as pd
df = df = pd.read_table(file1, names=['id'])
df1 = df = pd.read_table(file2, names=['id'])
df1[~df1.isin(df)].dropna()['id'].values.tolist()
['text 00', 'text 001']
答案 4 :(得分:0)
with open(file1) as f1, open(file2) as f2:
for f1_line, f2_line in zip(f1, f2):
if f1_line != f2_line:
print f2_line
例如一个完整的工作示例:
from io import StringIO
f1 = StringIO("""text123
text1234
text12345@
text12""")
f2 = StringIO("""text123
text 00
text 001
text 12""")
for f1_line, f2_line in zip(f1, f2):
if f1_line != f2_line:
print(f2_line, end='')
输出:
text 00
text 001
text 12