使用BeatifulSoup从URL提取XML数据并输出到字典

时间:2019-01-14 17:01:22

标签: python xml

这里我需要从URL(汇率列表)中读取XML数据,输出是字典...现在我只能获得第一种货币...尝试使用find_all却没有成功... 有人可以评论我需要放置for循环以读取所有值的地方吗?

import bs4 as bs
import urllib.request

source urllib.request.urlopen('http://www.xxxy.hr/Downloads/PBZteclist.xml').read()
soup = bs.BeautifulSoup(source,'xml')

name = soup.find('Name').text
unit = soup.find('Unit').text
buyratecache = soup.find('BuyRateCache').text
buyrateforeign = soup.find('BuyRateForeign').text
meanrate = soup.find('MeanRate').text
sellrateforeign = soup.find('SellRateForeign').text
sellratecache = soup.find('SellRateCache').text


devize =  {'naziv_valute': '{}'.format(name),
           'jedinica': '{}'.format(unit),
           'kupovni': '{}'.format(buyratecache),
           'kupovni_strani': '{}'.format(buyrateforeign),
           'srednji': '{}'.format(meanrate),
           'prodajni_strani': '{}'.format(sellrateforeign),
           'prodajni': '{}'.format(sellratecache)}

print ("devize:",devize)

XML示例:

<ExchRates>
    <ExchRate>
        <Bank>Privredna banka Zagreb</Bank>
        <CurrencyBase>HRK</CurrencyBase>
        <Date>12.01.2019.</Date>
        <Currency Code="036">
            <Name>AUD</Name>
            <Unit>1</Unit>
            <BuyRateCache>4,485390</BuyRateCache>
            <BuyRateForeign>4,530697</BuyRateForeign>
            <MeanRate>4,646869</MeanRate>
            <SellRateForeign>4,786275</SellRateForeign>
            <SellRateCache>4,834138</SellRateCache>
        </Currency>
        <Currency Code="124">
            <Name>CAD</Name>
            <Unit>1</Unit>
            <BuyRateCache>4,724225</BuyRateCache>
            <BuyRateForeign>4,771944</BuyRateForeign>
            <MeanRate>4,869331</MeanRate>
            <SellRateForeign>4,991064</SellRateForeign>
            <SellRateCache>5,040975</SellRateCache>
        </Currency>
        <Currency Code="203">
            <Name>CZK</Name>
            <Unit>1</Unit>
            <BuyRateCache>0,280057</BuyRateCache>
            <BuyRateForeign>0,284322</BuyRateForeign>
            <MeanRate>0,290124</MeanRate>
            <SellRateForeign>0,297377</SellRateForeign>
            <SellRateCache>0,300351</SellRateCache>
        </Currency>
        ...etc...
    </ExchRate>
</ExchRates>

1 个答案:

答案 0 :(得分:0)

仅遍历所有 Currency 节点(而不是soup对象),甚至使用列表推导来构建字典列表:

soup = bs.BeautifulSoup(source, 'xml')

# ALL EXCHANGE RATE NODES
curency_nodes = soup.findAll('Currency')

# LIST OF DICTIONAIRES
devize_list = [{'naziv_valute': c.find('Name').text,
                'jedinica': c.find('Unit').text,
                'kupovni': c.find('BuyRateCache').text,
                'kupovni_strani': c.find('BuyRateForeign').text,
                'srednji': c.find('MeanRate').text,
                'prodajni_strani': c.find('SellRateForeign').text,
                'prodajni': c.find('SellRateCache').text
               } for c in curency_nodes]

或者,由于要提取所有元素,因此请结合字典理解:

devize_list = [{n.name: n.text} for c in currency_nodes \
                                    for n in c.children if n.name is not None ]