我正在尝试使用Python对网站进行垃圾处理,但遇到了一些麻烦。我已经在网上充斥了许多文章和此处的问题,但我仍然无法做我需要做的事情。 我有这个网站:
,我需要打印商店的名称及其地址,并将其保存在文件中(可以是csv或excel)。我已经尝试过硒,大熊猫,漂亮的汤,但是没有用:(
有人可以帮我吗?
答案 0 :(得分:1)
import requests
from bs4 import BeautifulSoup
page = requests.get("https://beta.nhs.uk/find-a-pharmacy/results?latitude=51.2457238068354&location=Little%20London%2C%20Hampshire%2C%20SP11&longitude=-1.45959328501975")
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.find_all("div", class_="results__details")
for container in data:
Pharmacyname = container.find_all("h2")
Pharmacyadd = container.find_all("p")
for pharmacy in Pharmacyname:
for add in Pharmacyadd:
print(add.text)
continue
print(pharmacy.text)
输出:
Shepherds Spring Pharmacy Ltd is 1.8 miles away
The Oval,
Cricketers Way,
Andover,
Hampshire,
SP10 5DN
01264 355700
Map and directions for Shepherds Spring Pharmacy Ltd at The Oval
Services available in Shepherds Spring Pharmacy Ltd at The Oval
Open until 6:15pm today
Shepherds Spring Pharmacy Ltd
Tesco Instore Pharmacy is 2.1 miles away
Tesco Superstore,
River Way,
Andover,
Hampshire,
SP10 1UZ
0345 677 9007
.
.
.
注意:您可以为
pharmacy_name
和pharmacy_add
来存储数据,然后写入文件。 PS。您 也可以从列表中删除不需要的文本(假设 每个药房电话号码后的文本)
答案 1 :(得分:0)
import requests
from bs4 import BeautifulSoup
import re
import xlsxwriter
workbook = xlsxwriter.Workbook('File.xlsx')
worksheet = workbook.add_worksheet()
request = requests.get("https://beta.nhs.uk/find-a-pharmacy/results?latitude=51.2457238068354&location=Little%20London%2C%20Hampshire%2C%20SP11&longitude=-1.45959328501975")
soup = BeautifulSoup(request.content, 'html.parser')
data = soup.find_all("div", class_="results__details")
formed_data=[]
for results_details in data:
formed_data.append([results_details.find_all("h2")[0].text,re.sub(' +',' ',results_details.find_all("p")[1].text.replace('\n',''))])
row=col=0
for name, adress in (formed_data):
worksheet.write(row, col, name)
worksheet.write(row, col + 1, adress)
row += 1
workbook.close()