如何抓取该特定网站的数据?

时间:2020-01-10 18:01:17

标签: python web-scraping

我正在尝试从“ https://www.pharmacie.be/?max_results=50&txt-zip=1000”网站上刮掉值班药房

enter image description here

,但是使用beautifulSoup或Selenium时内容不显示;所以我想这是Javascript生成的。有没有办法到达内容?

这是硒版本代码。

url = "https://www.pharmacie.be/?max_results=50&txt-zip=1000"
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')# access the browser in incognito mode
options.add_argument('--headless') # access the browser without having to open it

driver = webdriver.Chrome("C:/webdrivers/chromedriver79.exe", options=options)
driver.get(url)
page = driver.page_source

关于如何访问<div class="api-results">标签的任何想法?

Thx

1 个答案:

答案 0 :(得分:1)

数据是通过Java脚本加载的,因此您可以使用selenium来获取数据,也可以使用requests并手动执行查询:

例如:

import json
import requests

search_term = '1000 bruxelles'

url1 = 'https://api.tomtom.com/search/2/geocode/{}.json?key=ixTHgmn1oIBAMGhFbkAWgG5ajGKI4psb&limit=1&countrySet=BE'
url2 = 'https://api.geowacht.be/api-v4/json/pharmacies/near_coordinate?&latitude={}&longitude={}&jsonp=?&max_distance=30&max_results=5&language=fr'

data = requests.get(url1.format(search_term)).json()

lat, lon = data['results'][0]['position']['lat'], data['results'][0]['position']['lon']

data = requests.get(url2.format(lat, lon), headers={'Api-user-agent':'gwapi.js/4.0 (pharmacie.be)'}).json()

# print(json.dumps(data, indent=4)) # <-- uncomment to see all data

for result in data['results']:
    print(result['pharmacy']['name'])
    print(result['pharmacy']['address_street'], result['pharmacy']['address_streetnr'])
    print(result['pharmacy']['address_postalcode'], result['pharmacy']['address_locality'])
    print('-' * 80)

打印:

Pharmacie Tsiokanos
Rue des Fripiers 24
1000 Bruxelles
--------------------------------------------------------------------------------
Pharmacie Etoile du Nord-Jumatex
Rue du Progrès 27
1210 Saint-Josse-ten-Noode
--------------------------------------------------------------------------------
Pharmacie Du Midi
Avenue Fonsny 29
1060 Saint-Gilles
--------------------------------------------------------------------------------
Pharmacie de la Duch.de Brabant
Place de la Duchesse de Brabant 39
1080 Molenbeek-Saint-Jean
--------------------------------------------------------------------------------
Pharmacie Elouriaghli Salwa
Chaussée de Merchtem 98
1080 Molenbeek-Saint-Jean
--------------------------------------------------------------------------------