您可以使用biopython

时间:2016-05-09 21:21:53

标签: python database search biopython ncbi

我的任务是使用NCBI的E-Utilties检索过去10年中每年提交的有关Crispr / Cas9系统的论文数量。我如何一次搜索多个数据库?到目前为止我的代码:

from Bio import Entrez


Entrez.email = "example@gmail.com"
handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2016/01/01, maxdate=2016/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2016 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2015/01/01, maxdate=2015/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2015 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2014/01/01, maxdate=2014/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2014 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2013/01/01, maxdate=2013/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2013 is: ", record["Count"]


handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2012/01/01, maxdate=2012/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2012 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2011/01/01, maxdate=2011/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2011 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2010/01/01, maxdate=2010/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2010 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2009/01/01, maxdate=2009/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2009 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2008/01/01, maxdate=2008/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2008 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2007/01/01, maxdate=2007/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2007 is: ", record["Count"]

1 个答案:

答案 0 :(得分:1)

您可能已经认识到,您的代码是高度冗余的,这是for循环的典型案例:

from Bio import Entrez

years = range(2016, 2006, -1)  # Creates a list from 2016 to 2007

Entrez.email = "Example@mail.org"

for year in years:  # Go through the list 'years' and assign the value to the variable 'year'
    handle = Entrez.esearch(db ="pubmed", term="Crispr Cas9",
                            mindate=year, maxdate=year)
    record = Entrez.read(handle)
    print "Number of papers in %d is %s" %(year, record["Count"])  # 'Old' string formatting

提及CrispR / Cas9系统的所有论文也不太可能使用确切的短语“Cripr / Cas9”并包含“系统”一词。使用搜索词“Crispr Cas9”可以获得更多结果。