Question

我正在尝试比较两个文件，并在第一个文件中提取与第一列的第二个文件对应的行。例如：

文件1：

VarID GeneID TaxName PfamName
3810359 1327    Isochrysidaceae Methyltransf_21&Methyltransf_22
6557609 5442    Peridiniales    NULL
4723299 7370    Prorocentrum    PEPCK_ATP
3019317 10454   Dinophyceae     NULL
2821675 10965   Bacillariophyta PK;PK_C
5559318 12824   Dinophyceae     Cyt-b5&FA_desaturase

文件2：

对于输出我想要这个文件：

VarID GeneID TaxName PfamName
3810359 1327    Isochrysidaceae Methyltransf_21&Methyltransf_22
6557609 5442    Peridiniales    NULL
4723299 7370    Prorocentrum    PEPCK_ATP

我试过这段代码：

f1 = sys.argv[1]
f2 = sys.argv[2]

file1_rows = []
with open(f1, 'r') as file1:
    for row in file1:
        file1_rows.append(row.split())

# Read data from the second file
file2_rows = []
with open(f2, 'r') as file2:    
    for row in file2:
        file2_rows.append(row.split())

# Compare data and compute results
results = []
for row in file2_rows:
    if row[:1] in file1_rows:
        results.append(row[:4])
    else:
        results.append(row[:4])

# Print the results
for row in results:
    print(' '.join(row))

你能帮我吗？谢谢!!

Answer 1

你的问题在这里：

if row[:1] in file1_rows:

row[:1]返回包含1个字段的列表（行中的第一列）。相反，直接搜索该行。

这是新代码：

if row[0] in file1_rows:

另外，删除与此关联的else如果（我猜这是错误地添加到调试中）

您可以做的其他一些更好的做法，我在这里写了这些：

f1 = sys.argv[1]
f2 = sys.argv[2]

with open(f1, 'r') as file1:
    file1_rows = file1.read().splitlines()

# Read data from the second file
with open(f2, 'r') as file2:    
    file2_rows = file2.read().splitlines()

# Compare data and compute results
results = []
for row2 in file2_rows:
    for row in file1_rows:
        if row2 in row:
            results.append(row)
            break

print('\n'.join(results))

使用python比较两个文本文件

1 个答案: