我正在使用BioPython从PubMed标题中填写有关引文的CSV文件。到目前为止我写过这个:
import csv
from Bio import Entrez
import bs4
Entrez.email = "my_email"
CSVfile = open('srData.csv')
fileReader = csv.reader(CSVfile)
Data = list(fileReader)
with open('blank.csv','w') as f1:
writer=csv.writer(f1, delimiter='\t',lineterminator='\n',)
for id in Data:
handle = Entrez.efetch(db="pubmed", id=id, rettype="gb", retmode="xml")
record = Entrez.read(handle)
title=record[0]['MedlineCitation']['Article']['ArticleTitle']
abstract=record[0]['MedlineCitation']['Article']['Abstract']
mesh =record[0]['MedlineCitation']['MeshHeadingList']
descriptors = ','.join(term['DescriptorName'] for term in mesh)
writer.writerow([title, abstract, descriptors])
然而,这会产生一个不寻常的输出,其中标题,摘要和MeSH术语分布在多个列中而不是分开,我认为这是由于它们的类型。 ()。我希望我的csv表由三列组成,一列包含标题,另一列是摘要,另一列是网格术语。
我该如何做到这一点?
示例输出
为了澄清,第一列包含整个标题,摘要的开头和接下来的几列包含摘要的后续部分。我要求将它们分成不同的列。即。第一列应仅包含标题。第二个只是摘要,第三个只是MeSH术语。
目前,第一列包含:
"Distinct and combined vascular effects of ACE blockade and HMG-CoA reductase inhibition in hypertensive subjects. {u'AbstractText': ['Hypercholesterolemia and hypertension are frequently associated with elevated sympathetic activity. Both are independent cardiovascular risk factors and both affect endothelium-mediated vasodilation. To identify the effects of cholesterol-lowering and antihypertensive treatments on vascular reactivity and vasodilative capacity"
答案 0 :(得分:1)
record[0]['MedlineCitation']['Article']['Abstract']
的值是包含抽象文本和较短摘要的字典。如果你想要实际的摘要,而不是:
abstract=record[0]['MedlineCitation']['Article']['Abstract']
你需要:
abstract=record[0]['MedlineCitation']['Article']['Abstract']['AbstractText'][0]
现在abstract
包含一个字符串,应该适合写入您的CSV文件。
<强>更新强>
即使使用相同的输入数据,我也无法重现您在评论中描述的错误:
>>> from Bio import Entrez
>>> Entrez.email = '...'
>>> id=10067800
>>> handle = Entrez.efetch(db="pubmed", id=id, rettype="gb", retmode="xml")
>>> record = Entrez.read(handle)
>>> abstract=record[0]['MedlineCitation']['Article']['Abstract']['AbstractText'][0]
>>> abstract
StringElement('To assess the antihypertensive efficacy and safety of the novel AT1 receptor antagonist, telmisartan, compared with that of enalapril in elderly patients with mild to moderate hypertension.', attributes={u'NlmCategory': u'OBJECTIVE', u'Label': u'OBJECTIVE'})
>>>