从交互式Web地图Web抓取屏幕图像

时间:2018-12-21 03:46:37

标签: python node.js web-scraping beautifulsoup cheerio

我需要从以下位置将地图组件提取为静态图像: http://www.bom.gov.au/water/landscape/#/sm/Relative/day/-35.30/145.17/5/Point////2018/12/16/

此页面包含基于传单的交互式Web地图,其中图层数据每天通过Web地图服务进行更新。提取的图像应包含地图上加载的所有图层。

这也需要自动进行,因此没有人会打开Web浏览器并转到URL。提取的图像将转到Word文档。

我是Python和Node.js程序员,但是我无法通过BeautifulSoup for Python或Cheerio for Node.js进行网络抓取,因为地图不是img元素,而是一些动态DIV。如何将地图组合作为图像?

1 个答案:

答案 0 :(得分:4)

您可以使用:

from PIL import Image
from selenium import webdriver

driver = webdriver.Firefox()
driver.maximize_window() # maximize window
driver.get("http://www.bom.gov.au/water/landscape/#/sm/Relative/day/-35.30/145.17/5/Point////2018/12/16/")
element = driver.find_element_by_xpath("//*[@id=\"mapid\"]"); # this is the map xpath
location = element.location;
size = element.size;
driver.save_screenshot("canvas.png");
x = location['x'];
y = location['y'];
width = location['x']+size['width'];
height = location['y']+size['height'];
im = Image.open('canvas.png')
im = im.crop((int(x), int(y), int(width), int(height)))
im.save('canvas_el.png') # your file

如果需要遍历每一层,请使用以下代码:

from time import sleep
driver.find_elements_by_class_name("leaflet-control-layers-toggle")[0].click(); # make layer selector visible
layers = driver.find_elements_by_class_name("leaflet-control-layers-selector"); # select each layer and wait 5seconds
for layer in layers:
    layer.click()
    sleep(5)
    # you can also capture screenshots here

enter image description here