如何在python中解析xml?

时间:2014-05-16 14:18:35

标签: python xml python-2.7 xpath xml-parsing

我必须从XML文档中提取friendlyName

这是我目前的解决方案:

root = ElementTree.fromstring(urllib2.urlopen(XMLLocation).read())        
for child in root.iter('{urn:schemas-upnp-org:device-1-0}friendlyName'):
    return child.text

我有更好的方法可以做到这一点(也许任何其他不涉及迭代的方式)?我可以使用XPath吗?


XML内容:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns="urn:schemas-upnp-org:device-1-0">
   <specVersion>
      <major>1</major>
      <minor>0</minor>
   </specVersion>
   <device>
      <dlna:X_DLNADOC xmlns:dlna="urn:schemas-dlna-org:device-1-0">DMR-1.50</dlna:X_DLNADOC>
      <deviceType>urn:schemas-upnp-org:device:MediaRenderer:1</deviceType>
      <friendlyName>My Product 912496</friendlyName>
      <manufacturer>embedded</manufacturer>
      <manufacturerURL>http://www.embedded.com</manufacturerURL>
      <modelDescription>Product</modelDescription>
      <modelName>Product</modelName>
      <modelNumber />
      <modelURL>http://www.embedded.com</modelURL>
      <UDN>uuid:93b2abac-cb6a-4857-b891-002261912496</UDN>
      <serviceList>
         <service>
            <serviceType>urn:schemas-upnp-org:service:ConnectionManager:1</serviceType>
            <serviceId>urn:upnp-org:serviceId:ConnectionManager</serviceId>
            <SCPDURL>/xml/ConnectionManager.xml</SCPDURL>
            <eventSubURL>/Event/org.mpris.MediaPlayer2.mansion/RygelSinkConnectionManager</eventSubURL>
            <controlURL>/Control/org.mpris.MediaPlayer2.mansion/RygelSinkConnectionManager</controlURL>
         </service>
         <service>
            <serviceType>urn:schemas-upnp-org:service:AVTransport:1</serviceType>
            <serviceId>urn:upnp-org:serviceId:AVTransport</serviceId>
            <SCPDURL>/xml/AVTransport2.xml</SCPDURL>
            <eventSubURL>/Event/org.mpris.MediaPlayer2.mansion/RygelAVTransport</eventSubURL>
            <controlURL>/Control/org.mpris.MediaPlayer2.mansion/RygelAVTransport</controlURL>
         </service>
         <service>
            <serviceType>urn:schemas-upnp-org:service:RenderingControl:3</serviceType>
            <serviceId>urn:upnp-org:serviceId:RenderingControl</serviceId>
            <SCPDURL>/xml/RenderingControl2.xml</SCPDURL>
            <eventSubURL>/Event/org.mpris.MediaPlayer2.mansion/RygelRenderingControl</eventSubURL>
            <controlURL>/Control/org.mpris.MediaPlayer2.mansion/RygelRenderingControl</controlURL>
         </service>
         <service>
            <serviceType>urn:schemas-embedded-com:service:RTSPGateway:1</serviceType>
            <serviceId>urn:embedded-com:serviceId:RTSPGateway</serviceId>
            <SCPDURL>/xml/RTSPGateway.xml</SCPDURL>
            <eventSubURL>/Event/org.mpris.MediaPlayer2.mansion/RygelRTSPGateway</eventSubURL>
            <controlURL>/Control/org.mpris.MediaPlayer2.mansion/RygelRTSPGateway</controlURL>
         </service>
         <service>
            <serviceType>urn:schemas-embedded-com:service:SpeakerManagement:1</serviceType>
            <serviceId>urn:embedded-com:serviceId:SpeakerManagement</serviceId>
            <SCPDURL>/xml/SpeakerManagement.xml</SCPDURL>
            <eventSubURL>/Event/org.mpris.MediaPlayer2.mansion/RygelSpeakerManagement</eventSubURL>
            <controlURL>/Control/org.mpris.MediaPlayer2.mansion/RygelSpeakerManagement</controlURL>
         </service>
         <service>
            <serviceType>urn:schemas-embedded-com:service:NetworkManagement:1</serviceType>
            <serviceId>urn:embedded-com:serviceId:NetworkManagement</serviceId>
            <SCPDURL>/xml/NetworkManagement.xml</SCPDURL>
            <eventSubURL>/Event/org.mpris.MediaPlayer2.mansion/RygelNetworkManagement</eventSubURL>
            <controlURL>/Control/org.mpris.MediaPlayer2.mansion/RygelNetworkManagement</controlURL>
         </service>
      </serviceList>
      <iconList>
         <icon>
            <mimetype>image/png</mimetype>
            <width>120</width>
            <height>120</height>
            <depth>32</depth>
            <url>/org.mpris.MediaPlayer2.mansion-120x120x32.png</url>
         </icon>
         <icon>
            <mimetype>image/png</mimetype>
            <width>48</width>
            <height>48</height>
            <depth>32</depth>
            <url>/org.mpris.MediaPlayer2.mansion-48x48x32.png</url>
         </icon>
         <icon>
            <mimetype>image/jpeg</mimetype>
            <width>120</width>
            <height>120</height>
            <depth>24</depth>
            <url>/org.mpris.MediaPlayer2.mansion-120x120x24.jpg</url>
         </icon>
         <icon>
            <mimetype>image/jpeg</mimetype>
            <width>48</width>
            <height>48</height>
            <depth>24</depth>
            <url>/org.mpris.MediaPlayer2.mansion-48x48x24.jpg</url>
         </icon>
      </iconList>
      <X_embeddedDevice xmlns:edd="schemas-embedded-com:extended-device-description">
         <firmwareVersion>v1.0 (4.155.1.15.002)</firmwareVersion>
         <features>
            <feature>
               <name>com.sony.Product</name>
               <version>1.0.0</version>
            </feature>
            <feature>
               <name>com.sony.Product.btmrc</name>
               <version>1.0.0</version>
            </feature>
            <feature>
               <name>com.sony.Product.btmrs</name>
               <version>1.0.0</version>
            </feature>
         </features>
      </X_embeddedDevice>
   </device>
</root>

3 个答案:

答案 0 :(得分:0)

佩德罗,在评论中是对的。

.find(match, namespaces=None)

查找匹配匹配的第一个子元素。 match可以是标签名称或路径。返回元素实例或None。 namespaces是从名称空间前缀到全名的可选映射。

ElemntTree文档在这些情况下非常有用。 https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.find

修改 我在评论中给出的链接导致以下代码:

import xml.etree.ElementTree as ET
input = '''<stuff>
<users>
<user x="2">
<id>001</id>
<name>Chuck</name>
</user>
<user x="7">
<id>009</id>
<name>Brent</name>
</user>
</users>
</stuff>
'''
stuff = ET.fromstring(input)
lst = stuff.findall("users/user")
print len(lst)
for item in lst:
print item.attrib["x"]
item = lst[0]
ET.dump(item)
item.get("x")   # get works on attributes
item.find("id").text
item.find("id").tag
for user in stuff.getiterator('user') :
print "User" , user.attrib["x"]
ET.dump(user)

上面的代码使用:

item.find("id").text

如果你修改它,同时删除你不需要的其他代码......查找应该是这样的:

item.find('device/friendlyName').text

您可以获取xml文件,而不是使用带有以下内容的输入字符串(来自ElementTree文档):

import xml.etree.ElementTree as ET
tree = ET.parse('your_file_name.xml')

答案 1 :(得分:0)

使用ElementTree,您可以直接从文件中读取或将其加载到字符串中。

首先,包括以下导入。

from xml.etree.ElementTree import ElementTree
from xml.parsers.expat import ExpatError

如果您使用的是字符串:

from xml.etree.ElementTree import fromstring
try:
tree = fromstring(xml_data)
except ExpatData:
print "Unable to parse XML data from string"

否则,直接加载它:

try:
tree = ElementTree(file = "filename")
except ExpatData:
print "Unable to parse XML from file"

初始化树后,您可以开始解析信息。

root = tree.getroot()
print root.find('device/friendlyName').text

答案 2 :(得分:0)

import xml.etree.ElementTree as ElementTree

namespace = '{urn:schemas-upnp-org:device-1-0}'
root = ElementTree.fromstring(urllib2.urlopen(XMLLocation).read())

# The `//` specifies all subelements within the whole tree.
return root.find('.//{}friendlyName'.format(namespace)).text

find()函数在找到第一个匹配项时停止。要获取与XPath匹配的所有元素,请使用findall()函数。