使用lxml在python中解析多个名称空间XML

时间:2019-03-05 12:16:47

标签: python xml parsing namespaces lxml

getWarrentsNumber() {

    let id = localStorage.getItem('userId');

    this.peopleProvider.getAllWorkerAssignedWarrents(id).toPromise()
      .then(result => {
        this.NumberOfAssignedWarrents = result.length;
        localStorage.setItem('DodNalog', result);
      });
    this.peopleProvider.getAllWorkerFinishedWarrents(id).toPromise()
      .then(result => {
        this.NumberOfFinishedWarrents = result.length;
        localStorage.setItem('ZavNalog', result);
      });
    this.peopleProvider.getAllWorkerUnfinishedWarrents(id).toPromise()
      .then(result => {
        this.NumberOfUnfinishedWarrents = result.length;
        localStorage.setItem('NezavNalog', result);
        console.log(result);
      });
  }
  

我想为我拥有的每个项目提取<?xml-stylesheet href="/Style Library/st/xslt/rss2.xsl" type="text/xsl" media="screen" ?> <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:ta="http://www.smartraveller.gov.au/schema/rss/travel_advisories/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>Travel Advisories</title> <link>http://smartraveller.gov.au/countries/</link> <description>the Australian Department of Foreign Affairs and Trade's Smartraveller advisory service</description> <language>en</language> <webMaster>webmaster@dfat.gov.au</webMaster> <copyright>Copyright Commonwealth of Australia 2011</copyright> <ttl>60</ttl> <atom:link href="http://smartraveller.gov.au/countries/Documents/index.rss" rel="self" type="application/rss+xml" /> <generator>zcms</generator> <image> <title>Advice</title> <link>http://smartraveller.gov.au/countries/</link> <url>/Style Library/st/images/dfat_logo_small.gif</url> </image> <item> <title>Czech Republic</title> <description>This travel advice has been reviewed. The level of our advice has not changed. Exercise normal safety precautions in the Czech Republic.</description> <link>http://smartraveller.gov.au/Countries/europe/eastern/Pages/czech_republic.aspx</link> <pubDate>26 Oct 2018 05:25:14 GMT</pubDate> <guid isPermaLink="false">cdbcc3d4-3a89-4768-ac1d-0221f8c99227 GMT</guid> <ta:warnings> <dc:coverage>Czech Republic</dc:coverage> <ta:level>2/5</ta:level> <dc:description>Exercise normal safety precautions</dc:description> </ta:warnings> </item> 下的<ta:level>的值。我曾经尝试过现有的在线解决方案,但对我来说没有任何用。基本上,我的xml包含多个名称空间。

<warning>

1 个答案:

答案 0 :(得分:0)

XML有多个命名空间,但是您唯一需要担心的命名空间是http://www.smartraveller.gov.au/schema/rss/travel_advisories/

这是因为名称空间中指向目标的路径中唯一的元素是ta:levelta:warning

示例...

from lxml import etree
import requests

req = requests.request('GET', "https://smartraveller.gov.au/countries/documents/index.rss")
a = str(req.text).encode()

tree = etree.fromstring(a)

ns = {'ta': 'http://www.smartraveller.gov.au/schema/rss/travel_advisories/'}

e = tree.findall('channel/item/ta:warnings/ta:level', ns)
for i in e:
    print(i.text)

打印...

2/5
2/5
4/5
2/5
...and so on

如果需要列表,请考虑从findall()切换到xpath() ...

e = tree.xpath('channel/item/ta:warnings/ta:level/text()', namespaces=ns)
print(e)

打印...

['2/5', '2/5', '4/5', '2/5', and so on...]