如何在XML树中保存节点的位置以供以后使用?

时间:2016-10-27 15:56:52

标签: python xml parsing

我有一个已解析的XML树,并使用<url>节点获取了最后添加的<lastmod>节点。我如何&#34;保存&#34;树中的节点位置并使用它来获取它所属的<url>中的其他节点?

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://www.website.com/</loc>
    <changefreq>daily</changefreq>
  </url>
  <url>
    <loc>https://www.website.com/location/</loc>
    <lastmod>2016-10-13T06:03:41Z</lastmod>
    <changefreq>daily</changefreq>
    <image:image>
      <image:loc>https://website.com/image/</image:loc>
      <image:title>Title of Item</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://www.website.com/location/</loc>
    <lastmod>2016-09-15T07:11:22Z</lastmod>
    <changefreq>daily</changefreq>
    <image:image>
      <image:loc>https://website.com/image/</image:loc>
      <image:title>Title of Item</image:title>
    </image:image>
  </url>
</urlset>

第一个<url>标记是基于两个<url>标记的XML文档的最新添加内容。但是,您必须循环遍历整个XML文档才能找到答案。你如何保存&#34;位置&#34;那个XML标签后来获得<image:title>?这是我的代码:

tree = get_xml_data(line)
        jul_newest = 0.0  # establish a comparison value for the newest addition
        for child in tree:
            if child.tag.endswith("url"):
                for c in child:
                    if c.tag.endswith("lastmod"):
                        xml_date = c.text
                        year = float(xml_date[0:4])
                        month = float(xml_date[5:7])
                        day = float(xml_date[8:10])
                        hour = float(xml_date[11:13])
                        minute = float(xml_date[14:16])
                        second = float(xml_date[17:19])
                        # calculate Julian day number of recent addition
                        jul_day = julian(year, month, day, hour, minute, second)
                        if jul_day > jul_newest:
                            nt.set_year(int(year))
                            nt.set_month(int(month))
                            nt.set_day(int(day))
                            nt.set_hour(int(hour))
                            nt.set_minute(int(minute))
                            nt.set_second(int(second))
                            jul_newest = jul_day
                            nt.set_jul(jul_day)
        # find loc of the latest addition
        for child in tree:
            if child.tag.endswith("url"):
                for c in child:
                    if c.tag.endswith("loc"):
                        nt.set_location(c.text)

0 个答案:

没有答案