Question

我想编写一个小程序来读取XML文件中的数据并将其写入CSV。我通常使用元素树。

XML文件起源于手机应用程序，并且总是如下所示：

<waypoint><name><![CDATA[POI 2017-07-03 09:37:11nass]]></name> 
<coord lat="47.220430" lon="8.951071"/></waypoint>

访问coord-root及其内容（经度和纬度）没有任何问题。但是如何访问名称的信息：[CDATA[POI 2017-07-03 09:37:11nass]]？

到目前为止，我的代码看起来像这样：

for poi in POIS:
    tree = etree.parse(rootwayp + poi)
    root = tree.getroot()
    for child in root:
        for childchild in child:
            print(childchild.tag, ':', childchild.attrib)

我认为我需要为名称内容实现另一种阅读方法，因为括号不包含那里的信息。我试图作为名字的子孙访问信息，这不起作用（也许是因为括号中的！）究竟是什么！在<!...>意味着什么？

Answer 1

<![CDATA[...]]>是特别的 marked section

您可以使用以下选择器来提取所需的详细信息：

root = tree.getroot()

print(root.find('name').text)
print(root.find('coord').attrib.get('lat','n/a'))
print(root.find('coord').attrib.get('lon','n/a'))

# Output
POI 2017-07-03 09:37:11nass
47.220430
8.951071

使用lxml，您可以提取整个CDATA部分here is some doc about.

使用<！--..-->访问xml元素属性

1 个答案: