我有两个文件,文件1包含2列,文件2包含5列。 我想从文件2中删除不包含文件1的常用字符串的行:
-file 1,如果这是一个列表,则每行包含[0]和[1]
gene-3 +
gene-2 -
gene-1 -
-file 2,将文件1中的[0]和[1]与此文件的[0]和[4]进行比较。如果file1中的noline在file2的任何行中匹配,则必须将其删除。
gene-1 mga CDF 1 + # this line contains + instead - although gane-1 is the same. rm
gene-2 mga CDS 1 - # [0][1] from file 1 = [0][4] from file 2: (gene-2, - ) keep it!
gene-3 mga CDH 1 + # "" "" ""
gene-4 mga CDS 1 + # no gene-4 in file 1, remove.
- 期望输出:
gene-3 mga CDH 1 +
gene-2 mga CDS 1 -
任何想法?
答案 0 :(得分:1)
with open("file1.txt") as f, open("file2.txt") as f1:
items = set(line.rstrip() for line in f)
filtered = [line for line in f1 if " ".join(line.split()[::4]) in items]
with open("file2.txt","w") as f3:
f3.writelines(filtered)
答案 1 :(得分:0)
with open('file1', 'r') as f:
keepers = set(tuple(line.split()) for line in f)
with open('file2', 'r') as f_in, open('file3', 'w') as f_out:
for line in f_in:
parts = line.split()
if (parts[0], parts[-1]) in keepers:
f_out.write(line)