在2个不同的文件中查找通用行

时间:2018-11-14 18:23:43

标签: python

我试图在2个不同的文件中找到公用的行,并尝试在新的文本文件中列出它们。我在下面编写了此文件,但找不到公用,仅写了我在arg2中提供的任何文件。请帮助我进行故障排除。

#!/usr/bin/python

import sys


def find_common_lines(arg1, arg2, arg3):
    fh1 = open(arg1, 'r+')
    fh2 = open(arg2, 'r+')
    with open(arg3, 'w+') as f:
        for line in fh1 and fh2:
            if line:
                f.write(line)

    fh1.close()
    fh2.close()


number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
    print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
    print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
    arg1 = sys.argv[1]
    arg2 = sys.argv[2]
    arg3 = sys.argv[3]
    find_common_lines(arg1, arg2, arg3)

因此,基本上我希望此脚本执行的操作是:

文件A

AAB
BBC
DDE
GGC

文件B

123
AAB
DDE
345
GHY
GJK

文件C

AAB
DDE

谢谢!

3 个答案:

答案 0 :(得分:1)

首先,在使用“和”运算符时,您需要给出2条逻辑语句,现在您正在使用1条逻辑语句,然后直接在for循环中输入fh2。尝试将代码更改为以下几行:

for line in fh1 and fh2:
    if line:
        f.write(line)

if line in fh1:
    if line in fh2:
        f.write(line)

答案 1 :(得分:0)

您可以为此使用python的库pandas

为每个.txt文件创建数据框,如下所示:

In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)

In [2018]: df_A
Out[2018]: 
     0
0  AAB
1  BBC
2  DDE
3  GGC

In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)

In [2020]: df_B
Out[2020]: 
     0
0  123
1  AAB
2  DDE
3  345
4  GHY
5  GJK

现在,merge两个数据框(如内部联接)仅查找两者之间的公共行。

In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
     0
0  AAB
1  DDE

然后,您可以将此输出写入如下文件:

In [2023]: df_C.to_csv('out.csv', index=False)

这将非常有效,因为不需要循环,也不需要编写任何复杂的正则表达式。代码变得更加简洁明了。

让我知道这是否有帮助。

答案 2 :(得分:0)

尝试使用字典:

import sys
def find_common_lines(arg1, arg2, arg3):
    alllines_dict = {}
    with open(arg1, 'r') as f:
        while True:
            line = f.readline()
            if not line:
                break
            alllines_dict[line.strip()] = 1
    with open(arg3, 'w') as out:
        with open(arg2, 'r') as f:
            while True:
                line2 = f.readline()
                if not line2:
                    break
                line2 = line2.strip()
                ispresent = alllines_dict.get(line2, None)
                if ispresent is not None:
                    out.write(line2 + '\n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
    print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
    print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
    arg1 = sys.argv[1]
    arg2 = sys.argv[2]
    arg3 = sys.argv[3]
    find_common_lines(arg1, arg2, arg3)