我正在尝试从以下2个网站中抓取地理位置:
https://zendantenneskaart.omgeving.vlaanderen.be/ ->为此,我找到了底层源json文件,因此很容易https://www.mercator.vlaanderen.be/raadpleegdienstenmercatorpubliek/us/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=us:us_zndant_pnt&outputFormat=application/json
http://www.sites.bipt.be/index.php?language=EN ->对于这个,我找不到这样的json文件;而且,我无法找到一种方法来使用漂亮的汤来刮它,因为图钉的可见性取决于地图的缩放比例
有什么想法可以刮掉第二个网站的所有地理位置?
答案 0 :(得分:0)
您可以使用网址http://www.sites.bipt.be/ajaxinterface.php
,并且由于纬度/经度参数指定了很大的范围。这样一来,您就可以一次获得所有数据。
例如:
import json
import requests
from html import unescape
url = 'http://www.sites.bipt.be/ajaxinterface.php'
data = {"action": "getSites",
"latfrom": "-9999",
"latto": "9999",
"longfrom": "-9999",
"longto": "9999",
"LangSiteTable": "sitesfr"}
data = requests.post(url, data=data).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for d in data[:10]: # <-- print only first 10 items
print('{:<50}{:<50}{:<30}{:<40} {:.4f} {:.4f}'.format(d['Eigenaar1'], unescape(d['Locatie']), unescape(d['Adres']), unescape(d['PostcodeGemeente']), float(d['Longitude']), float(d['Latitude'])))
print()
print('Total items:', len(data))
打印:
Orange Belgium: 203W1_2 Cité de la Bruyère Clos des Marronniers 201 1480 Tubize 4.2086 50.6810
Telenet: _AN0171A Watertoren Scheeveld 2870 Puurs 4.2718 51.0744
Telenet: _AN0235V E34 2290 Vorselaar 4.7578 51.2449
Orange Belgium: 148L1_6 Institut Provincial d'Enseignement Supérieur Rue du Commerce 14 4100 Seraing 5.5077 50.6130
Orange Belgium: 198L1_5 / 32198L1_1 / 42198L1_1 Lieu-dit 'Bièster' Thier de Coo 4970 Stavelot 5.8876 50.3859
Telenet: _NR1363A Route de Sovenne 5560 Houyet 4.9529 50.1997
Orange Belgium: 181R1_1 Route Rimbaut / Route Rimbaut 6890 Libin 5.1504 49.9989
Proximus: 80WAM_00 Rue de Hottleux 71 4950 Waimes 6.0879 50.4152
Orange Belgium: 013R1_8 Rue Saint-Michel 6870 Saint-Hubert 5.3666 50.0355
Proximus: 41BIA_00 Aéroport de Bierset batiment 56 Aérodrome 4460 Grâce-Hollogne 5.4584 50.6416
Total items: 8104