如何匹配两个文本文件,查找匹配项并替换原始内容?

时间:2015-05-26 07:09:37

标签: python regex

基本上我有2个文本文件。

文字档案A :(重复字符串)

hg17_chr2_74388709_74389
hg17_chr5_137023651_1370
hg17_chr7_137880501_1378
hg17_chr5_137023651_1370

文字文件B:

hg17_chrX_52804801_52805856
hg17_chr15_79056833_79057564
hg17_chr2_74388709_74389559
hg17_chr1_120098891_120099441
hg17_chr5_137023651_137024301
hg17_chr11_85997073_85997627
hg17_chr7_137880501_137881251

文件A被工具修剪,因此可以发现匹配对于两个文件的每个字符串的前24个字符完全相同。如何匹配这两个文件并将结果输出到具有所需内容的新文件中:

hg17_chr2_74388709_74389559
hg17_chr5_137023651_137024301
hg17_chr7_137880501_137881251
hg17_chr5_137023651_137024301

2 个答案:

答案 0 :(得分:1)

这可能是一个考虑的选项

 with open("file_C.txt", "w") as f_3:  # Open file C
    with open("file_A.txt") as f_1:  # Open file A
        for line_a in f_1:  # Iterates over each line in file A
            with open("file_B.txt") as f_2 : # Open file B
                for line_b in f_2:  # Iterates over each line in file B
                    # If line in file B starts as line in file A
                    if line_b.startswith(line_a.rstrip()): 
                        f_3.write(line_b)  # Write line of file B
                        # breaks the loop of file_b 
                        # to continue with the next line in file_a
                        break  

答案 1 :(得分:1)

简单的解决方案,只打开一次文件:

with open('file_a','r') as fa:  # open file a --> read the files into lists
    list_a = fa.read().splitlines()
with open('file_b','r') as fb:  # open file b --> read the files into lists
    list_b = fb.read().splitlines()

# get element in list_b if list_a contain the element(only first 24 characters)
match_list = [n for n in list_b if n[:24] in list_a]

with open('file_c','w+') as fc:  # write the matching list to the new file
    fc.write('\n'.join(match_list))