我有一个大的xml文件" abcd.xml"几乎800 MB。如果用户输入与作者或标题匹配,我想获得书籍列表的信息。
我用一个小文件完成了它,如何使用iterparse()对大文件进行操作。
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<article mdate="2011-01-11" key="journals/acta/Saxena96">
<author>Sanjeev Saxena</author>
<title>Parallel Integer Sorting and Simulation Amongst CRCW Models.</title>
<pages>607-619</pages>
<year>1996</year>
<volume>33</volume>
<journal>Acta Inf.</journal>
<number>7</number>
<url>db/journals/acta/acta33.html#Saxena96</url>
<ee>http://dx.doi.org/10.1007/BF03036466</ee>
</article>
编码:
import lxml.etree as ET
data = ET.parse('abcd.xml')
root = ET.fromstring(data)
title = raw_input('enter the name: ')
article = root.xpath('.//article[starts-with(title, "%s")]' % title)[0]
for prop in ['author', 'pages', 'year', 'volume', 'journal']:
print article.findtext(prop)
输出结构: -
Sanjeev Saxena
Parallel Integer Sorting and Simulation Amongst CRCW Models.
607-619
1996
33
Acta Inf.
........
........
........
答案 0 :(得分:0)
lxml
模块raw_input()
article
标记title
在第2步中以用户输入开头。article
代码代码:
import lxml.etree as ET
root = ET.parse('input.xml')
title = raw_input('enter the name: ')
articles = root.xpath('.//article[starts-with(title, "%s")]' % title)
result = []
for article in articles:
tmp = []
for i in article.getchildren():
tmp.append((i.tag, i.text))
result.append(tmp)
#- Print result:
for i in result:
print "\n"
for j in i:
print "%s:%s"%(j[0], j[1])
输出:
vivek@vivek:~/Desktop/stackoverflow/anna$ python 3.py
enter the name: Parallel Integer Sorting and Simulation
author:Sanjeev Saxena
title:Parallel Integer Sorting and Simulation Amongst CRCW Models.
pages:607-619
year:1996
volume:33
journal:Acta Inf.
number:7
url:db/journals/acta/acta33.html#Saxena96
ee:http://dx.doi.org/10.1007/BF03036466
author:Sanjeev Saxena
title:Parallel Integer Sorting and Simulation Amongst CRCW Models.11
pages:607-619
year:1996
volume:33
journal:Acta Inf.
number:7
url:db/journals/acta/acta33.html#Saxena96
ee:http://dx.doi.org/10.1007/BF03036466
vivek@vivek:~/Desktop/stackoverflow/anna$