如何从XML int Python中获得一些价值?

时间:2018-10-25 13:10:55

标签: python xml elementtree

我在xml中有此站点地图。我如何获得每个<loc>

<?xml version="1.0" encoding="UTF-8"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<!-- created with Free Online Sitemap Generator www.xml-sitemaps.com -->


<url>
  <loc>https://www.nsnam.org/wiki/Main_Page</loc>
  <lastmod>2018-10-24T03:03:05+00:00</lastmod>
  <priority>1.00</priority>
</url>
<url>
  <loc>https://www.nsnam.org/wiki/Current_Development</loc>
  <lastmod>2018-10-24T03:03:05+00:00</lastmod>
  <priority>0.80</priority>
</url>
<url>
  <loc>https://www.nsnam.org/wiki/Developer_FAQ</loc>
  <lastmod>2018-10-24T03:03:05+00:00</lastmod>
  <priority>0.80</priority>
</url>

程序看起来像这样。

import os.path
import xml.etree.ElementTree
import requests
from subprocess import call

def creatingListOfBrokenLinks():
    if (os.path.isfile('sitemap.xml')):
        e = xml.etree.ElementTree.parse('sitemap.xml').getroot()
        file = open("all_broken_links.txt", "w")

        for atype in e.findall('url'):
            r = requests.get(atype.find('loc').text)
            print(atype)
            if (r.status_code == 404):
                file.write(atype)

        file.close()


if __name__ == "__main__":
    creatingListOfBrokenLinks()

2 个答案:

答案 0 :(得分:0)

我建议您使用elementtree标准库软件包:

    let box = document.querySelector('.dd_box')

    let ddCb = document.querySelector('#dd_cb')
   
    var inputs = document.querySelectorAll("input[type=radio]"),
  x = inputs.length;
while (x--)
  inputs[x].addEventListener("change", function() {

    alert('click');
    box.style.opacity = 0 // avoid showing the init style while switching the 'active' class

    box.classList.add('in-active')
    box.classList.remove('active')

    // force dom update
    setTimeout(() => {
      box.classList.add('active')
      box.style.opacity = ''
    }, 5)

    box.addEventListener('animationend', onanimationend)
  }, 0);
    // 
    ddCb.addEventListener('change', () => {
      box.classList.add('active')
    })


    function onanimationend() {
      box.classList.remove('active', 'in-active')
      box.removeEventListener('animationend', onanimationend)
    }

文档链接:

更新:

  • 您的代码出错的是XML名称空间处理。
  • 此外,我的示例使用 body { background-color: rgba(30, 30, 30); } #dropdown { width: 500px; height: 300px; top: 50px; left: 100px; position: absolute; } #dropdown input[type=checkbox] { display: none; } .dd_bttn /*clickable button*/ { width: 25px; height: 25px; top: 0px; left: -25px; position: absolute; z-index: 10; background-color: darkorange; cursor: pointer; } .dd_bttn:hover { background-color: purple; } .dd_box { width: 100%; height: 100%; top: 0px; left: 50%; position: absolute; transform: scale(0); background: grey; } @keyframes zzzib { 0% { transform: translate(-50%) scale(0); background-color: red; } 20% { transform: translateX(-50%) scale(0.9); } 100% { transform: translateX(-50%) scale(1); } } .dd_box.active { animation: zzzib 1s forwards; animation-timing-function: ease-in-out; } .dd_box.in-active { animation-direction: reverse; animation-timing-function: ease-in-out; }而不是 <div id="dropdown"> <input type="checkbox" id="dd_cb"> <label id="dd_label" for="dd_cb"> <div class="dd_bttn"></div> </label> <div class="dd_box"> <input type="radio" class="dd_rb" name="rb"> <input type="radio" class="dd_rb" name="rb"> <input type="radio" class="dd_rb" name="rb"> </div> </div> / let box = document.querySelector('.dd_box') let ddCb = document.querySelector('#dd_cb') let ddRb = document.querySelector('.dd_rb') var inputs = document.querySelectorAll("input[type=radio]"), x = inputs.length; while (x--) inputs[x].addEventListener("change", function() { alert('click'); box.style.opacity = 0 // avoid showing the init style while switching the 'active' class box.classList.add('in-active') box.classList.remove('active') // force dom update setTimeout(() => { box.classList.add('active') box.style.opacity = '' }, 5) box.addEventListener('animationend', onanimationend) }, 0); // play normal ddCb.addEventListener('change', () => { box.classList.add('active') }) // play in reverses ddRb.addEventListener('click', () => { box.style.opacity = 0 // avoid showing the init style while switching the 'active' class box.classList.add('in-active') box.classList.remove('active') // force dom update setTimeout(() => { box.classList.add('active') box.style.opacity = '' }, 5) box.addEventListener('animationend', onanimationend) }) function onanimationend() { box.classList.remove('active', 'in-active') box.removeEventListener('animationend', onanimationend) }直接获取 body { background-color: rgba(30, 30, 30); } #dropdown { width: 500px; height: 300px; top: 50px; left: 100px; position: absolute; } #dropdown input[type=checkbox] { display: none; } .dd_bttn /*clickable button*/ { width: 25px; height: 25px; top: 0px; left: -25px; position: absolute; z-index: 10; background-color: darkorange; cursor: pointer; } .dd_bttn:hover { background-color: purple; } .dd_box { width: 100%; height: 100%; top: 0px; left: 50%; position: absolute; transform: scale(0); background: grey; } @keyframes zzzib { 0% { transform: translate(-50%) scale(0); background-color: red; } 20% { transform: translateX(-50%) scale(0.9); } 100% { transform: translateX(-50%) scale(1); } } .dd_box.active { animation: zzzib 1s forwards; animation-timing-function: ease-in-out; } .dd_box.in-active { animation-direction: reverse; animation-timing-function: ease-in-out; }元素。取决于XML的结构和用例,这可能行不行。

答案 1 :(得分:0)

您的代码对我而言效果很好。您要做的就是在{http://www.sitemaps.org/schemas/sitemap/0.9}url

之前添加loc

这里:

import os.path
import xml.etree.ElementTree
import requests
from subprocess import call

def creatingListOfBrokenLinks():
    if (os.path.isfile('sitemap.xml')):
        e = xml.etree.ElementTree.parse('sitemap.xml').getroot()
        file = open("all_broken_links.txt", "w")

        for atype in e.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url'):
            r = requests.get(atype.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc').text)
            print(atype)
            if (r.status_code == 404):
                file.write(atype)

        file.close()


if __name__ == "__main__":
    creatingListOfBrokenLinks()