使用biopython搜索pubmed

时间:2016-10-20 18:03:04

标签: xlrd biopython pubmed

我正在尝试输入200多个条目,以便记录作者发表的文章数量,并通过包括他/她的导师和机构来完善搜索。我试图使用biopython和xlrd(代码在下面)这样做,但我一直得到所有三种查询格式的0结果(1.按名称,2。按名称和机构名称,和3.按名称和导师的名字)。我是否可以执行故障排除步骤,或者在使用下面指定的关键字搜索pubmed时应该使用其他格式?

输入查询的输出示例; search_term是一个链接列表,其中包含输入查询的列表。

print(*search_term[8:15], sep='\n')


[text:'Andrew Bland', 'Weill Cornell Medical College', text:'David Cutler MD']
[text:'Andy Price', 'University of Alabama at Birmingham School of Medicine', text:'Jason Warem, PhD']
[text:'Bah Chamin', 'University of Texas Southwestern Medical School', text:'Dr. Timothy Hillar']
[text:'Eduo Cera', 'University of Colorado School of Medicine', text:'Dr. Tim']

用于生成上述输入查询并在Pubmed上搜索的代码:

Entrez.email = "mollyzhaoe@college.harvard.edu"
for search_term in search_terms[8:55]:
    handle = Entrez.egquery(term="{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))

    handle_1 = Entrez.egquery(term = "{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[2]))

    handle_2 = Entrez.egquery(term = "{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[1]))

    record = Entrez.read(handle)
    record_1 = Entrez.read(handle_1)
    record_2 = Entrez.read(handle_2)
    pubmed_count = ['','','']
    for row in record["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[0] = row["Count"]

    for row in record_1["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[1] = row["Count"]

    for row in record_2["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[2] = row["Count"]

1 个答案:

答案 0 :(得分:1)

检查你的缩进,很难知道哪个部分属于哪个循环。

如果您想进行问题排查,请尝试打印egquery,例如

print("{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))

并将输出粘贴到pubmed,看看你得到了什么。也许修改一下,看看哪个搜索词会导致问题。

您的输入格式有点难以猜测。打印查询并确保获得正确的搜索值。

对于作者姓名,试图摆脱学术头衔,PubMed可能会将他们与首字母混淆,例如: House MD,可能是Mark David House。