我有2个fastq文件F1.fastq和F2.fastq。 F2.fastq是一个较小的文件,它是F1.fastq的读取子集。我希望在F1.fastq中读取不在F2.fastq中的内容。以下python代码似乎不起作用。你能建议编辑吗?
needed_reads = []
reads_array = []
chosen_array = []
for x in Bio.SeqIO.parse("F1.fastq","fastq"):
reads_array.append(x)
for y in Bio.SeqIO.parse("F2.fastq","fastq"):
chosen_array.append(y)
for y in chosen_array:
for x in reads_array:
if str(x.seq) != str(y.seq) : needed_reads.append(x)
output_handle = open("DIFF.fastq","w")
SeqIO.write(needed_reads,output_handle,"fastq")
output_handle.close()
答案 0 :(得分:2)
您可以使用集合来完成您的要求,您可以将list1
转换为set
,然后将list2
转换为set
,然后转换set(list1) - set(list2)
,它将在list1
中提供不在list2
中的项目。
示例代码 -
needed_reads = []
reads_array = []
chosen_array = []
for x in Bio.SeqIO.parse("F1.fastq","fastq"):
reads_array.append(x)
for y in Bio.SeqIO.parse("F2.fastq","fastq"):
chosen_array.append(y)
needed_reads = list(set(reads_array) - set(chosen_array))
output_handle = open("DIFF.fastq","w")
SeqIO.write(needed_reads,output_handle,"fastq")
output_handle.close()