Question

在python中工作，我的目标是解析我创建的XML文档并创建一个嵌套的列表列表，以便稍后访问它们并解析这些源。 XML文档类似于以下代码段：

<?xml version="1.0'>
<sources>
    <!--Source List by Institution-->
    <sourceList source="cbc">
        <f>http://rss.cbc.ca/lineup/topstories.xml</f>
    </sourceList>
    <sourceList source="bbc">
        <f>http://feeds.bbci.co.uk/news/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/world/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/uk/rss.xml</f>
    </sourceList>
    <sourceList source="reuters">
        <f>http://feeds.reuters.com/reuters/topNews</f>
        <f>http://feeds.reuters.com/news/artsculture</f>
    </sourceList>
</sources>

我希望有类似嵌套列表的内容，其中最里面的列表将是<f></f>标签和上面列表之间的内容，其中一个将使用源ex的名称创建。 source="reuters"将成为路透社。从XML文档中检索信息不是问题，我正在使用elementtree进行检索，并使用node.get('source')等检索循环。问题是我无法生成具有所需名称和不同来源所需的不同长度的列表。我尝试过追加，但不确定如何使用检索到的名称附加到列表中。字典会更好吗？在这种情况下最好的做法是什么？我怎么能做这个工作？如果需要更多信息，只需发表评论，我一定要添加。

Answer 1

根据您的描述，根据源名称和根据Feed列表的值包含密钥的字典可能会起作用。

这是构建这样一种野兽的一种方法：

from lxml import etree
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source.xpath('./f')]
    for source in etree.parse('x.xml').xpath('/sources/sourceList')}

pprint(news_sources)

另一个示例，没有lxml或xpath：

import xml.etree.ElementTree as ET
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source]
    for source in ET.parse('x.xml').getroot()}

pprint(news_sources)

最后，如果你对列表理解过敏：

import xml.etree.ElementTree as ET
from pprint import pprint

xml = ET.parse('x.xml')
root = xml.getroot()
news_sources = {}
for sourceList in root:
    sourceListName = sourceList.attrib['source']
    news_sources[sourceListName] = []
    for feed in sourceList:
       feedName = feed.text
       news_sources[sourceListName].append(feedName)

pprint(news_sources)

从XML doc生成嵌套列表

1 个答案: