Question

我正在尝试使用Biopython模块从NCBI获取入藏号的fasta序列。通常序列成功下载。但有一段时间我得到以下错误

http.client.IncompleteRead：IncompleteRead（读取61808640字节）

我搜索了How to handle IncompleteRead: in python

的答案

我尝过最佳答案https://stackoverflow.com/a/14442358/4037275。这是工作。然而，问题是，它下载了部分序列。还有别的办法吗？任何人都可以指出我正确的方向。

感谢您的时间。

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my email id"


def extract_fasta_sequence(NC_accession):
    "This takes the NC_accession number and fetches their fasta sequence"
    print("Extracting the fasta sequence for the NC_accession:", NC_accession)
    handle = Entrez.efetch(db="nucleotide", id=NC_accession, rettype="fasta", retmode="text")
    record = handle.read()

Answer 1

您需要添加try / except以捕获此类常见网络错误。请注意异常httplib.IncompleteRead是更一般的HTTPException的子类，请参阅：https://docs.python.org/3/library/http.client.html#http.client.IncompleteRead

e.g。 http://lists.open-bio.org/pipermail/biopython/2011-October/013735.html

另见https://github.com/biopython/biopython/pull/590会抓住NCBI Entrez API可能遇到的其他一些错误（NCBI应该处理的错误，但是没有。）

如何处理IncompleteRead：在biopython中

1 个答案: