从交互式网站地图中收集数据

时间:2020-06-26 15:43:49

标签: python selenium web-scraping beautifulsoup

我正在尝试从以下2个网站中抓取地理位置

  1. https://zendantenneskaart.omgeving.vlaanderen.be/ ->为此,我找到了底层源json文件,因此很容易https://www.mercator.vlaanderen.be/raadpleegdienstenmercatorpubliek/us/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=us:us_zndant_pnt&outputFormat=application/json

  2. http://www.sites.bipt.be/index.php?language=EN ->对于这个,我找不到这样的json文件;而且,我无法找到一种方法来使用漂亮的汤来刮它,因为图钉的可见性取决于地图的缩放比例

有什么想法可以刮掉第二个网站的所有地理位置?

1 个答案:

答案 0 :(得分:0)

您可以使用网址http://www.sites.bipt.be/ajaxinterface.php,并且由于纬度/经度参数指定了很大的范围。这样一来,您就可以一次获得所有数据。

例如:

import json
import requests
from html import unescape


url = 'http://www.sites.bipt.be/ajaxinterface.php'

data = {"action": "getSites",
 "latfrom": "-9999",
 "latto": "9999",
 "longfrom": "-9999",
 "longto": "9999",
 "LangSiteTable": "sitesfr"}


data = requests.post(url, data=data).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for d in data[:10]: # <-- print only first 10 items
    print('{:<50}{:<50}{:<30}{:<40} {:.4f} {:.4f}'.format(d['Eigenaar1'], unescape(d['Locatie']), unescape(d['Adres']), unescape(d['PostcodeGemeente']), float(d['Longitude']), float(d['Latitude'])))

print()
print('Total items:', len(data))

打印:

Orange Belgium: 203W1_2                           Cité de la Bruyère                                Clos des Marronniers 201      1480 Tubize                              4.2086 50.6810
Telenet: _AN0171A                                 Watertoren                                        Scheeveld                     2870 Puurs                               4.2718 51.0744
Telenet: _AN0235V                                                                                   E34                           2290 Vorselaar                           4.7578 51.2449
Orange Belgium: 148L1_6                           Institut Provincial d'Enseignement Supérieur      Rue du Commerce 14            4100 Seraing                             5.5077 50.6130
Orange Belgium: 198L1_5 / 32198L1_1 / 42198L1_1   Lieu-dit 'Bièster'                                Thier de Coo                  4970 Stavelot                            5.8876 50.3859
Telenet: _NR1363A                                                                                   Route de Sovenne              5560 Houyet                              4.9529 50.1997
Orange Belgium: 181R1_1                                                                             Route Rimbaut / Route Rimbaut 6890 Libin                               5.1504 49.9989
Proximus: 80WAM_00                                                                                  Rue de Hottleux 71            4950 Waimes                              6.0879 50.4152
Orange Belgium: 013R1_8                                                                             Rue Saint-Michel              6870 Saint-Hubert                        5.3666 50.0355
Proximus: 41BIA_00                                Aéroport de Bierset batiment 56                   Aérodrome                     4460 Grâce-Hollogne                      5.4584 50.6416

Total items: 8104