这个可能最好总结为“Python不让我做一些愚蠢的事情”,但我离题了 - 我养成了将XML转换成字符串的坏习惯,所以我可以使用正则表达式从中取出一些东西,而不是而不是真正做一个好人并做Xpath的事情。
我现在遇到一个问题,我正在遍历一个dicts列表(dicts本身是几个级别,包含编码的XML)。我正在尝试re.findall(pattern,str(listitem)),它给了我一个“unhashable type:'DictionaryElement'”错误。有什么想法吗?
编辑:这是使用biopython发布的API内容:
handle = Entrez.efetch(db="pubmed", id=pmids, retmode="xml")
records = Entrez.read(handle)
records = list(records)
meshterms = {}
for y in records:
meshterms[y] = re.findall(r'(?<=DescriptorName\'\:\sStringElement\(\').+?(?=\')',str(y))
y会包含以下内容:
{u'MedlineCitation': DictElement({u'OtherID': [], u'OtherAbstract': [], u'CitationSubset': ['IM'], u'KeywordList': [], u'DateCreated': {u'Month': '11', u'Day':
'20', u'Year': '2012'}, u'SpaceFlightMission': [], u'GeneralNote': [], u'Article': DictElement({u'ArticleDate': [], u'Pagination': {u'MedlinePgn': '140-54'}, u'AuthorList': ListElement([DictElement({u'LastName': 'Goupil', u'Initials': 'L',
u'NameID': [], u'ForeName': 'Louise'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Bekinschtein', u'Initials': 'T', u'NameID': [], u'ForeName': 'Tristan'}, attributes={u'ValidYN': u'Y'})], attributes={u'Type': u'authors', u'CompleteYN': u'Y'}), u'Language': ['eng'], u'PublicationTypeList': ['Journal Article'], u'Journal': {u'ISSN': StringElement('0003-9829', attributes={u'IssnType': u'Print'}), u'ISOAbbreviation': 'Arch Ital Biol', u'JournalIssue': DictElement({u'Volume': '150', u'Issue': '2-3', u'PubDate': {u'Month': 'Jun', u'Year': '2012'}}, attributes={u'CitedMedium': u'Print'}), u'Title': 'Archives italiennes de biologie'}, u'Affiliation': 'MRC Cognition and Brain Sciences Unit, 15 Chauces Road, CB2 7EF, Cambridge,UK Email: louisegoupil@hotmal.fr.', u'ArticleTitle': 'Cognitive processing during the transition to sleep.', u'ELocationID': [StringElement('10.4449/aib.v150i2.1247', attributes={u'ValidYN': u'Y', u'EIdType': u'doi'})], u'Abstract': {u'AbstractText': ['Several dramatic physiological and behaviourl changes occur during the transition from wakefulness to sleep. The process is regarded as a grey area of consciousness between attentive wakefulness and slow wave sleep. Although there is evidence of neurophysiological integration decay
as signalled by sleep EEG elements, changes in power spectra and coherence, thalamocortical connectivity in fMRI, and single neuron changes in firing patterns,
little is known about the cognitive and behavioural dynamics of these transitions. Hereby we revise the body and brain physiology, behaviour and phenomenology of these changes of consciousness and propose an experimental framework to integrate the two aspects of consciousness that interact in the transition, wakefulness and awareness.']}}, attributes={u'PubModel': u'Print'}), u'PMID': StringElement('23165874', attributes={u'Version': u'1'}), u'MedlineJournalInfo': {u'MedlineTA': 'Arch Ital Biol', u'Country': 'Italy', u'NlmUniqueID': '0372441', u'ISSNLinking': '0003-9829'}}, attributes={u'Owner': u'NLM', u'Status': u'In-Data-Review'}), u'PubmedData': {u'ArticleIdList': [StringElement('23165874', attributes={u'IdType': u'pubmed'})], u'PublicationStatus': 'ppublish', u'History': [DictElement({u'Month': '2', u'Day': '07', u'Year': '2012'}, attributes={u'PubStatus': u'accepted'}), DictElement({u'Minute': '0', u'Month': '11', u'Day': '21', u'Hour': '6', u'Year': '2012'}, attributes={u'PubStatus': u'entrez'}), DictElement({u'Minute': '0', u'Month': '11', u'Day': '21', u'Hour': '6', u'Year': '2012'}, attributes={u'PubStatus': u'pubmed'}), DictElement({u'Minute': '0', u'Month': '11', u'Day': '21', u'Hour': '6', u'Year': '2012'}, attributes={u'PubStatus': u'medline'})]}
我的正则表达式试图在DescriptorName下面拉出StringElement的内容(顺便说一下,上面的记录中没有,但你明白了。
谢谢!
答案 0 :(得分:0)
scratch = open("mcs_scratch.txt","wb")
scratch.write(str(y))
scratch = open("mcs_scratch.txt","r")
y = str(scratch.read())
不知何故,我怀疑这是一种好的做法,但它确实有用。