以有组织的方式从网站获取信息

时间:2019-02-08 13:51:54

标签: python web-scraping

我正在尝试使用Python对网站进行垃圾处理,但遇到了一些麻烦。我已经在网上充斥了许多文章和此处的问题,但我仍然无法做我需要做的事情。 我有这个网站:

https://beta.nhs.uk/find-a-pharmacy/results?latitude=51.2457238068354&location=Little%20London%2C%20Hampshire%2C%20SP11&longitude=-1.45959328501975

,我需要打印商店的名称及其地址,并将其保存在文件中(可以是csv或excel)。我已经尝试过硒,大熊猫,漂亮的汤,但是没有用:(

有人可以帮我吗?

2 个答案:

答案 0 :(得分:1)

import requests
from bs4 import BeautifulSoup


page = requests.get("https://beta.nhs.uk/find-a-pharmacy/results?latitude=51.2457238068354&location=Little%20London%2C%20Hampshire%2C%20SP11&longitude=-1.45959328501975")

soup = BeautifulSoup(page.content, 'html.parser')
data = soup.find_all("div", class_="results__details")

for container in data:
  Pharmacyname = container.find_all("h2")
  Pharmacyadd  = container.find_all("p")
  for pharmacy in Pharmacyname:
      for add in Pharmacyadd:
          print(add.text)
          continue
      print(pharmacy.text)

输出:

Shepherds Spring Pharmacy Ltd is 1.8 miles away

       The Oval, 
       Cricketers Way, 

       Andover, 
       Hampshire, 
       SP10 5DN
      01264 355700

Map and directions for Shepherds Spring Pharmacy Ltd at The Oval
Services available in Shepherds Spring Pharmacy Ltd at The Oval
Open until 6:15pm today
Shepherds Spring Pharmacy Ltd
Tesco Instore Pharmacy is 2.1 miles away

       Tesco Superstore, 
       River  Way, 

       Andover, 
       Hampshire, 
       SP10 1UZ
      0345 677 9007

      .
      .
      .
  

注意:您可以为pharmacy_name和   pharmacy_add来存储数据,然后写入文件。 PS。您   也可以从列表中删除不需要的文本(假设   每个药房电话号码后的文本)

答案 1 :(得分:0)

import requests
from bs4 import BeautifulSoup
import re
import xlsxwriter

workbook  = xlsxwriter.Workbook('File.xlsx')
worksheet = workbook.add_worksheet()

request = requests.get("https://beta.nhs.uk/find-a-pharmacy/results?latitude=51.2457238068354&location=Little%20London%2C%20Hampshire%2C%20SP11&longitude=-1.45959328501975")
soup = BeautifulSoup(request.content, 'html.parser')
data = soup.find_all("div", class_="results__details")
formed_data=[]
for results_details in data:
    formed_data.append([results_details.find_all("h2")[0].text,re.sub(' +',' ',results_details.find_all("p")[1].text.replace('\n',''))])
row=col=0
for name, adress in (formed_data):
    worksheet.write(row, col, name)
    worksheet.write(row, col + 1, adress)
    row += 1
workbook.close()

enter image description here