我的目标是编写一个可以从网站检索特定数据的Python脚本。
具体而言,我必须提取这些数据:
<span class="street-address" itemprop="streetAddress">191, Corso Peschiera</span>
和
<div itemprop="telephone" class="tel elementPhone">0184 662271</div>
当然只有号码和地址!
虽然我尝试提取普通的'div'或'a'或'href'但我没有任何问题,但我无法改进我的研究。
这是我的代码...我无法写入文件,除非我只将soup.find_all('a')
之类的参数传递给bs4
:
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.paginegialle.it/ricerca/lidi%20balneari/Torino?')
data = r.text
soup = BeautifulSoup(data,"html.parser")
dia = soup.find_all('<div itemprop="telephone" class="tel elementPhone"></div>')
for link in soup.find_all('<div itemprop="telephone" class="tel elementPhone"></div>'):
print (dia)
documento=open("mbsprovalive.csv","w")
documento.write(dia)
documento.close()
我该如何解决这个问题?
答案 0 :(得分:0)
您可以使用bs4
的attrs
字段来准确指定您感兴趣的课程,如下所示:
#!/usr/bin/env python
from bs4 import BeautifulSoup
import requests
data = requests.get('your url here').text
soup = BeautifulSoup(data,"html.parser")
for i,j in zip(soup.find_all('span', attrs={'class':'street-address'}), soup.find_all('div', attrs={'class':'tel elementPhone'})):
print i.text, j.text
这应该有效!
答案 1 :(得分:0)
这是完美运作的代码。只有一些烦人的格式问题。
from bs4 import BeautifulSoup
import requests
import csv
r = requests.get('https://www.paginegialle.it/ricerca/pizzerie/Milano?
mr=50')
data = r.text
soup = BeautifulSoup(data,"html.parser")
with open('mbsprprova.csv', 'w') as csvfile:
fieldnames = ['nome', 'indirizzo', 'telefono']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
for i,j,z in zip(soup.find_all('span', attrs={'class':'street-address'}), soup.find_all('div', attrs={'class':'telelementPhone'}), soup.find_all('span', attrs={'itemprop':'name'})):
writer.writeheader()
writer.writerow({'nome': z.text, 'telefono': j.text, 'indirizzo': i.text})