基本上我有2个文本文件。
文字档案A :(重复字符串)
hg17_chr2_74388709_74389
hg17_chr5_137023651_1370
hg17_chr7_137880501_1378
hg17_chr5_137023651_1370
文字文件B:
hg17_chrX_52804801_52805856
hg17_chr15_79056833_79057564
hg17_chr2_74388709_74389559
hg17_chr1_120098891_120099441
hg17_chr5_137023651_137024301
hg17_chr11_85997073_85997627
hg17_chr7_137880501_137881251
文件A被工具修剪,因此可以发现匹配对于两个文件的每个字符串的前24个字符完全相同。如何匹配这两个文件并将结果输出到具有所需内容的新文件中:
hg17_chr2_74388709_74389559
hg17_chr5_137023651_137024301
hg17_chr7_137880501_137881251
hg17_chr5_137023651_137024301
答案 0 :(得分:1)
这可能是一个考虑的选项
with open("file_C.txt", "w") as f_3: # Open file C
with open("file_A.txt") as f_1: # Open file A
for line_a in f_1: # Iterates over each line in file A
with open("file_B.txt") as f_2 : # Open file B
for line_b in f_2: # Iterates over each line in file B
# If line in file B starts as line in file A
if line_b.startswith(line_a.rstrip()):
f_3.write(line_b) # Write line of file B
# breaks the loop of file_b
# to continue with the next line in file_a
break
答案 1 :(得分:1)
简单的解决方案,只打开一次文件:
with open('file_a','r') as fa: # open file a --> read the files into lists
list_a = fa.read().splitlines()
with open('file_b','r') as fb: # open file b --> read the files into lists
list_b = fb.read().splitlines()
# get element in list_b if list_a contain the element(only first 24 characters)
match_list = [n for n in list_b if n[:24] in list_a]
with open('file_c','w+') as fc: # write the matching list to the new file
fc.write('\n'.join(match_list))