Question

我正在使用biopython的ncbi eutils包装API来检索某种蛋白质编码基因的相关蛋白质，相同蛋白质和变体蛋白质（转录物，剪接变体等）。

此信息显示在“mRNA和蛋白质”部分下的蛋白质编码基因on its ncbi page。

我正在通过LinkName=protein_protein_identical检索相同的蛋白质并通过LinkName=protein_protein进行检索。

有没有办法检索蛋白质编码基因的转录本？

Answer 1

这很容易但很烦人（涉及到XML疯狂）。首先从Entrez中检索你的记录：

handle = Entrez.efetch(db="gene",
                       id="10555",
                       retmode="xml")

现在handle是XML行的生成器。您可以使用Biopython中的Entrez.parse()解析它们，但我发现XML太纠结于处理它。你的mRNA ids在

<Entrezgene_comments>
 <Gene-commentary>
  <Gene-commentary_comment>
   <Gene-commentary>
    <Gene-commentary_products>
     <Gene-commentary>
      <Gene-commentary_type value="mRNA">
       <Gene-commentary_products>
        <Gene-commentary>
         <Gene-commentary_type value="peptide">
          <Gene-commentary_accession>NP_001012745</Gene-commentary_accession>

使用Entrez.parse()进行解析后，您将混合使用带有列表的词组进行潜水，直至达到您的入藏ID。获得此id之后，您可以通过以下方式询问序列：

handle = Entrez.efetch(db="protein",
                       id="NP_001012745",
                       rettype="fasta",
                       retmode="text")

另一种方法涉及解析gene_table。获取与之前相同的句柄，但不是XML要求gene_table：

handle = Entrez.efetch(db="gene",
                       id="10555",
                       rettype="gene_table",
                       retmode="text")

在gene_table中你会找到一些形式的行：

mRNA transcript variant 2 NM_001012727.1
protein isoform b precursor NP_001012745.1
Exon table for  mRNA  NM_001012727.1 and protein NP_001012745.1

从哪里可以获得ids。

Biopython检索蛋白质编码基因的蛋白质转录物

1 个答案: