Question

我正在尝试为提供的蛋白质序列生成一个weblogo。以下是我的代码：

from Bio.Seq import Seq
from Bio import motifs
from Bio.Alphabet import generic_protein

instances = [Seq("RWST"),
              Seq("RTAG"),
              Seq("RQGC"),
              Seq("RMAA"),
             ]


m = motifs.create(instances)
m.weblogo("mymotif.png")

我收到以下错误：

counts[letter][position] += 1
KeyError: 'R'

完整堆栈跟踪：

<ipython-input-3-ee8922743152> in <module>()
     10 
     11 
---> 12 m = motifs.create(instances)
     13 m.weblogo("mymotif.png")

lib/site-packages/Bio/motifs/__init__.py in create(instances, alphabet)
     21 def create(instances, alphabet=None):
     22     instances = Instances(instances, alphabet)
---> 23     return Motif(instances=instances, alphabet=alphabet)
     24 
     25 

lib/site-packages/Bio/motifs/__init__.py in __init__(self, alphabet, instances, counts)
    236             self.instances = instances
    237             alphabet = self.instances.alphabet
--> 238             counts = self.instances.count()
    239             self.counts = matrix.FrequencyPositionMatrix(alphabet, counts)
    240             self.length = self.counts.length

lib/site-packages/Bio/motifs/__init__.py in count(self)
    192         for instance in self:
    193             for position, letter in enumerate(instance):
--> 194                 counts[letter][position] += 1
    195         return counts
    196 

KeyError: 'R'

Answer 1

Motif以alphabet作为关键字（命名）参数，motifs.create也是如此。如果不存在，则BioPython会假定该序列为DNA，并且在您的情况下，在字母表中找不到R。对于您的示例，您需要使用IUPAC.protein使其起作用。

注意：BioPython在内部使用letters来查看可用的字符，genericProtein没有字母。

from Bio import motifs
from Bio.Alphabet import IUPAC
from Bio.Seq import Seq

instances = [Seq("RWST", IUPAC.protein),
             Seq("RTAG", IUPAC.protein),
             Seq("RQGC", IUPAC.protein),
             Seq("RMAA", IUPAC.protein),
            ]

m = motifs.create(instances, IUPAC.protein)
m.weblogo("mymotif.png")

创建主题时biopython中的关键错误

1 个答案: