如何返回此kml文件中的文件夹元素列表?

时间:2017-05-30 22:45:25

标签: python xml xpath lxml kml

这是文件的顶部

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
  <Document>
    <Folder>
  <name>Points</name>
  <Placemark>
    <name>Port Saeed, Dubai</name>
    <styleUrl>#icon-1899-0288D1-nodesc</styleUrl>
    <Point>
      <coordinates>
        55.3295568,25.2513145,0
      </coordinates>
    </Point>
  </Placemark>
  <Placemark>
    <name>Retail Location #1</name>
    <description>Paris, France</description>
    <styleUrl>#icon-1899-0288D1</styleUrl>
    <Point>
      <coordinates>
        2.3620605,48.8867304,0
      </coordinates>
    </Point>
  </Placemark>
  <Placemark>
    <name>Odessa Oblast</name>
...

我想提取“文件夹”元素

这是我的代码。

tree = ET.parse(kml)
root = tree.getroot()

for element in root:
    print element.findall('.//{http://www.opengis.net/kml/2.2/}Folder')

现在打印[]。我认为这是命名空间的问题。我无法弄清楚如何创建该字符串?或许,或许它的价值使用xpath而不是?我想我会在命名空间中遇到同样的问题

1 个答案:

答案 0 :(得分:1)

考虑迭代 Folder 的所有后代,因为此节点包含子元素和孙元素。此外,解析中使用的名称空间前缀不应以正斜杠结尾。

import xml.etree.ElementTree as ET

root = ET.fromstring('''<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
  <Document>
    <Folder>
      <name>Points</name>
      <Placemark>
        <name>Port Saeed, Dubai</name>
        <styleUrl>#icon-1899-0288D1-nodesc</styleUrl>
        <Point>
          <coordinates>
        55.3295568,25.2513145,0
          </coordinates>
        </Point>
      </Placemark>
      <Placemark>
        <name>Retail Location #1</name>
        <description>Paris, France</description>
        <styleUrl>#icon-1899-0288D1</styleUrl>
        <Point>
          <coordinates>
        2.3620605,48.8867304,0
          </coordinates>
        </Point>
      </Placemark>
    </Folder>
  </Document>
</kml>''')

# FIND ALL FOLDERS
for i in root.findall('.//{http://www.opengis.net/kml/2.2}Folder'):
    # FIND ALL FOLDER'S DESCENDANTS
    for inner in i.findall('.//*'):
        data = inner.text.strip()     # STRIP LEAD/TRAIL WHITESPACE
        if len(data) > 1:             # LEAVE OUT EMPTY ELEMENTS
            print(data)

# Points
# Port Saeed, Dubai
# icon-1899-0288D1-nodesc
# 55.3295568,25.2513145,0
# Retail Location #1
# Paris, France
# #icon-1899-0288D1
# 2.3620605,48.8867304,0

对于嵌套列表,将节点文本追加到列表中,其中每个内部列表对应于每个文件夹

data = []
for i in root.findall('.//{http://www.opengis.net/kml/2.2}Folder'):
    inner = []
    for t in i.findall('.//*'):
        txt = t.text.strip()
        if len(txt) > 1:
            inner.append(txt)

    data.append(inner)