我想刮掉姓名&来自该公司成员目录网页的数据:
http://mfda.ca/members/directory-of-members/
我希望输出存储在字典中,密钥作为成员的名称(即3i Financial Investment Services Inc.),值为地址。
我能够将该名称附加到字典中,但由于某种原因,我无法将其地址作为密钥附加。任何人都可以指导我如何做到这一点吗?
import requests
from bs4 import BeautifulSoup
import requests
url = "http://mfda.ca/members/directory-of-members/"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
#name
letters= soup.find_all("div", class_="col-sm-6 col-md-6")
lobbying={}
for element in letters:
lobbying[element.b.get_text()]={}
print(lobbying)
#addr
Addr= soup.find_all("div", class_="col-sm-6 col-md-6 p-marg")
for element in Addr:
address=element.p.get_text()
lobbying[element.p.get_text()]["addr"]=address
答案 0 :(得分:0)
我建议一起抓取名称和地址,同时构建dict:
lobbying = {}
rows = soup.find_all('div', {'class' : 'row member-name'})
for row in rows:
try:
name = row.find('div', {'class' : 'col-sm-6 col-md-6'})
addr = row.find('div', {'class' : 'col-sm-6 col-md-6 p-marg'})
lobbying[name.a.b.text] = {'addr' : addr.p.text}
except AttributeError:
pass
print(lobbying)
输出:
{
'3i Financial Investment Services Inc.': {
'addr': 'Suite #221, 9040 Leslie Street\nRichmond Hill, ON L4B 3M4\nPhone: (905) 597-5000\nFax: (905) 597-8366'
},
'ARTECH Asset Advisory Services Inc.': {
'addr': '209 - 3993 Henning Drive\nBurnaby, BC\xa0V5C 6P7\nPhone: (604) 434-3863\nFax: (604) 434-3873'
}
...
}