使用BeautifulSoup将名称和地址刮到dict中

时间:2017-08-08 03:04:35

标签: python web-scraping beautifulsoup

我想刮掉姓名&来自该公司成员目录网页的数据:

http://mfda.ca/members/directory-of-members/

我希望输出存储在字典中,密钥作为成员的名称(即3i Financial Investment Services Inc.),值为地址。

我能够将该名称附加到字典中,但由于某种原因,我无法将其地址作为密钥附加。任何人都可以指导我如何做到这一点吗?

import requests

from bs4 import BeautifulSoup

import requests

url = "http://mfda.ca/members/directory-of-members/"

r  = requests.get(url)

data = r.text

soup = BeautifulSoup(data)

#name
letters= soup.find_all("div", class_="col-sm-6 col-md-6")

lobbying={}
for element in letters:
    lobbying[element.b.get_text()]={}
print(lobbying)    

#addr
Addr= soup.find_all("div", class_="col-sm-6 col-md-6 p-marg")
for element in Addr:
    address=element.p.get_text()
    lobbying[element.p.get_text()]["addr"]=address

1 个答案:

答案 0 :(得分:0)

我建议一起抓取名称和地址,同时构建dict:

lobbying = {}
rows = soup.find_all('div', {'class' : 'row member-name'})

for row in rows:
    try:
        name = row.find('div', {'class' : 'col-sm-6 col-md-6'})
        addr = row.find('div', {'class' : 'col-sm-6 col-md-6 p-marg'})
        lobbying[name.a.b.text] = {'addr' : addr.p.text}
    except AttributeError:
        pass

print(lobbying)

输出:

{
    '3i Financial Investment Services Inc.': {
        'addr': 'Suite #221, 9040 Leslie Street\nRichmond Hill, ON L4B 3M4\nPhone: (905) 597-5000\nFax: (905) 597-8366'
    },
    'ARTECH Asset Advisory Services Inc.': {
        'addr': '209 - 3993 Henning Drive\nBurnaby, BC\xa0V5C 6P7\nPhone: (604) 434-3863\nFax: (604) 434-3873'
    }
...
}