您好我想使用python解析以下xml文件。我的“文件夹”变量设置为始终等于标签末尾的8位数字。在这种情况下,它是11119709。
的Python
for folder in folderList:
我希望能够说,当“folder”等于link标签中的最后8位数字时,请告诉我eq:seconds值是什么。我尝试使用python docs元素树提供的代码,但我遇到了麻烦,因为有太多的层次结构。 root [0] [1] .text不会检索item标签下的变量。谢谢你的帮助。
XML
-<rss xmlns:georss="http://www.georss.org/georss/" xmlns:eq="http://earthquake.usgs.gov/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" version="2.0">
-<channel>
<title>USGS Earthquake ShakeMaps</title>
<description>List of ShakeMaps for events in the last 30 days</description>
<link>http://earthquake.usgs.gov/</link>
<dc:publisher>U.S. Geological Survey</dc:publisher>
<pubDate>Thu, 27 Mar 2014 15:33:05 +0000</pubDate>
<item>
<title>4.11 - 79.3 miles NNW of Kotzebue</title>
<description>
<![CDATA[<img src="http://earthquake.usgs.gov/eqcenter/shakemap/thumbs/shakemap_ak_11199709.jpg" width="100" align="left" hspace="10"/><p>Date: Thu, 27 Mar 2014 07:28:31 UTC<br/>Lat/Lon: 67.9858/-163.494<br/>Depth: 15.9122</p>]]></description>
<link>http://earthquake.usgs.gov/eqcenter/shakemap/ak/shake/11199709/</link>
<pubDate>Thu, 27 Mar 2014 07:53:33 +0000</pubDate>
<geo:lat>67.9858</geo:lat>
<geo:long>-163.494</geo:long>
<dc:subject>4</dc:subject>
<eq:seconds>1395905311</eq:seconds>
<eq:depth>15.9122</eq:depth>
<eq:region>ak</eq:region>
</item>
<item>
...similar to above item
答案 0 :(得分:1)
如果您担心速度,我建议lxml。它有额外的依赖性,但通常比BeautifulSoup快得多。
答案 1 :(得分:0)
使用BeautifulSoup可以解析HTML和XML(使用外部模块),并且比Python中包含的更容易使用。
此代码应该按您的要求执行:
from bs4 import BeautifulSoup
xml = BeautifulSoup(open("filename.xml")) # here you load your XML file
# you can also load it from an URL by using "urllib" or "Python-Requests"
# BeautifulSoup(open("filename.xml"), "xml") # if you want to use an XML parser
# see comments below
for folder in folderList:
for item in xml.findAll("items"): # iterate through all <item> elements
if folder in item.link.text: # if folder's name is in the <link> element
print(item.find("eq:seconds").text) # print the <eq:seconds> element