在文件中,我需要替换一些字符。
字母= [“ B”,“ Z”,“ J”,“ U”,“ O”]
for record in SeqIO.parse(inFile, "fasta"):
for letter in letters:
if letters in str(record.seq):
print record.id
record.seq = str(record.seq).replace(letter, "X")
outFile.write(">%s\n%s\n" % (record.description, record.seq))
else:
outFile.write(">%s\n%s\n" % (record.description, record.seq))
#pass
问题在于输出看起来像这样,将输出写成我在字母中包含的尽可能多的字符:
> >ID:WP_004160595.1|Erwinia_amylovora_01SFR-BO|01SFR-BO|50S_ribosomal_protei..|630|NZ_CAPA01000010(58437):26053-26682:-1
> MIGLVGKKVGMTRIFTEDGVSIPVTVIEIEANRVTQVKGLENDGYTAIQVTTGAKKANRVTKPAAGHFAKAGVEAGRGLWEFRTAEGAEFTVGQSINVDIFADVKKVDVTGTSKGKGFAGTVKRWNFRTQDATHGNSLSHRVPGSIGQNQTPGKVFKGKKMAGQLGNERVTVQSLDVVRVDAERNLLLVKGAVPGATGSDLIVKPAVKA
> >ID:WP_004160595.1|Erwinia_amylovora_01SFR-BO|01SFR-BO|50S_ribosomal_protei..|630|NZ_CAPA01000010(58437):26053-26682:-1
> MIGLVGKKVGMTRIFTEDGVSIPVTVIEIEANRVTQVKGLENDGYTAIQVTTGAKKANRVTKPAAGHFAKAGVEAGRGLWEFRTAEGAEFTVGQSINVDIFADVKKVDVTGTSKGKGFAGTVKRWNFRTQDATHGNSLSHRVPGSIGQNQTPGKVFKGKKMAGQLGNERVTVQSLDVVRVDAERNLLLVKGAVPGATGSDLIVKPAVKA
> >ID:WP_004160595.1|Erwinia_amylovora_01SFR-BO|01SFR-BO|50S_ribosomal_protei..|630|NZ_CAPA01000010(58437):26053-26682:-1
> MIGLVGKKVGMTRIFTEDGVSIPVTVIEIEANRVTQVKGLENDGYTAIQVTTGAKKANRVTKPAAGHFAKAGVEAGRGLWEFRTAEGAEFTVGQSINVDIFADVKKVDVTGTSKGKGFAGTVKRWNFRTQDATHGNSLSHRVPGSIGQNQTPGKVFKGKKMAGQLGNERVTVQSLDVVRVDAERNLLLVKGAVPGATGSDLIVKPAVKA
> >ID:WP_004160595.1|Erwinia_amylovora_01SFR-BO|01SFR-BO|50S_ribosomal_protei..|630|NZ_CAPA01000010(58437):26053-26682:-1
> MIGLVGKKVGMTRIFTEDGVSIPVTVIEIEANRVTQVKGLENDGYTAIQVTTGAKKANRVTKPAAGHFAKAGVEAGRGLWEFRTAEGAEFTVGQSINVDIFADVKKVDVTGTSKGKGFAGTVKRWNFRTQDATHGNSLSHRVPGSIGQNQTPGKVFKGKKMAGQLGNERVTVQSLDVVRVDAERNLLLVKGAVPGATGSDLIVKPAVKA
> >ID:WP_004160595.1|Erwinia_amylovora_01SFR-BO|01SFR-BO|50S_ribosomal_protei..|630|NZ_CAPA01000010(58437):26053-26682:-1
> MIGLVGKKVGMTRIFTEDGVSIPVTVIEIEANRVTQVKGLENDGYTAIQVTTGAKKANRVTKPAAGHFAKAGVEAGRGLWEFRTAEGAEFTVGQSINVDIFADVKKVDVTGTSKGKGFAGTVKRWNFRTQDATHGNSLSHRVPGSIGQNQTPGKVFKGKKMAGQLGNERVTVQSLDVVRVDAERNLLLVKGAVPGATGSDLIVKPAVKA
答案 0 :(得分:4)
我认为您要尝试的是用'X'
代替含糊的IUPAC amino acid codes(加上您以某种方式获得的一些其他字母?)。
最好使用str.translate()
(在Python 3中)一次进行所有替换。另外,由于您使用的是Biopython来读取文件,因此您也可以使用Biopython轻松地编写输出文件。
from Bio import SeqIO
from Bio.Seq import Seq
letters = ["B", "Z", "J", "U", "O"]
trans_tab = str.maketrans(''.join(letters), 'X'*len(letters))
def yield_seqs(in_file):
for record in SeqIO.parse(in_file, 'fasta'):
record.seq = Seq(str(record.seq).translate(trans_tab))
yield record
SeqIO.write(yield_seqs('input.fasta'), 'output.fasta', 'fasta')
示例:
$ cat input.fasta
>1
MBZJ
$ python3 myscript.py
$ cat output.fasta
>1
MXXX
答案 1 :(得分:1)
您有错字。
if letters in str(record.seq):
代替
if letter in str(record.seq)
因此,您的支票总是失败,并为每个字母打印else
部分。