我想将for循环中的字符串元素添加到列表中(每次迭代),并保留唯一的字符串值,
#Script will give an output in order to count occurence of BGC clusters per Bacteria Type
#!/usr/bin/env python
from Bio import SeqIO
#Biopython package for input file (input and output assorted sequence file formats)
from Bio.SeqFeature import SeqFeature
import numpy as np
import sys
import pandas as pd
# Set the input file to be used and the output file to be written to
gbk_file = "E2N166_1.final.gbk" #example: "contig_1.final.gbk"
tsv_file = "cluster_count_file.txt" #output file can be .csv, .txt etc...
cluster_output = open(tsv_file, "w")
#Extract the Cluster info from GBK file,
for seq_record in SeqIO.parse(gbk_file, "genbank"):
cluster_list = []
for seq_feat in seq_record.features:
if seq_feat.type == "cluster":
cluster_number = seq_feat.qualifiers["note"][0].replace(" ","_").replace(":","")
cluster_type = seq_feat.qualifiers["product"][0]
cluster_list.append(cluster_type)
cluster_out.write("#" + cluster_number+ "\tCluster Type:" + cluster_type + "\n")
print("File Cluster Out")
输出如下Terminal Output
答案 0 :(得分:0)
如果要存储唯一元素,建议您使用set而不是list。 (您必须知道该集合未订购)
cluster_set = set()
但是,如果您要坚持使用列表,可以执行以下操作:
if cluster_type not in cluster_list:
cluster_list.append(cluster_type)