我试图在2个不同的文件中找到公用的行,并尝试在新的文本文件中列出它们。我在下面编写了此文件,但找不到公用,仅写了我在arg2中提供的任何文件。请帮助我进行故障排除。
#!/usr/bin/python
import sys
def find_common_lines(arg1, arg2, arg3):
fh1 = open(arg1, 'r+')
fh2 = open(arg2, 'r+')
with open(arg3, 'w+') as f:
for line in fh1 and fh2:
if line:
f.write(line)
fh1.close()
fh2.close()
number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
因此,基本上我希望此脚本执行的操作是:
文件A
AAB
BBC
DDE
GGC
文件B
123
AAB
DDE
345
GHY
GJK
文件C
AAB
DDE
谢谢!
答案 0 :(得分:1)
首先,在使用“和”运算符时,您需要给出2条逻辑语句,现在您正在使用1条逻辑语句,然后直接在for循环中输入fh2。尝试将代码更改为以下几行:
for line in fh1 and fh2:
if line:
f.write(line)
到
if line in fh1:
if line in fh2:
f.write(line)
答案 1 :(得分:0)
您可以为此使用python的库pandas
:
为每个.txt
文件创建数据框,如下所示:
In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)
In [2018]: df_A
Out[2018]:
0
0 AAB
1 BBC
2 DDE
3 GGC
In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)
In [2020]: df_B
Out[2020]:
0
0 123
1 AAB
2 DDE
3 345
4 GHY
5 GJK
现在,merge
两个数据框(如内部联接)仅查找两者之间的公共行。
In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
0
0 AAB
1 DDE
然后,您可以将此输出写入如下文件:
In [2023]: df_C.to_csv('out.csv', index=False)
这将非常有效,因为不需要循环,也不需要编写任何复杂的正则表达式。代码变得更加简洁明了。
让我知道这是否有帮助。
答案 2 :(得分:0)
尝试使用字典:
import sys
def find_common_lines(arg1, arg2, arg3):
alllines_dict = {}
with open(arg1, 'r') as f:
while True:
line = f.readline()
if not line:
break
alllines_dict[line.strip()] = 1
with open(arg3, 'w') as out:
with open(arg2, 'r') as f:
while True:
line2 = f.readline()
if not line2:
break
line2 = line2.strip()
ispresent = alllines_dict.get(line2, None)
if ispresent is not None:
out.write(line2 + '\n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)