通过循环网络抓取画布元素

时间:2020-10-27 22:57:54

标签: python selenium selenium-webdriver web-scraping html5-canvas

我想获取将鼠标悬停在https://childcaredeserts.org/2018/index.html?state=ID区域上时弹出的数据,现在我在Python中使用Selenium,程序将鼠标悬停在canvas元素的中心,将弹出的数据保存到列表中,然后,程序执行相同的操作,但是在我定义的其他位置(从上一点移到另一点)。这里的问题是区域和州的数量,这使得获取所有信息非常困难(通过我的方法,我应该将鼠标精确地移动到我想要的每个位置)。 如何编程循环,以便可以获得每个区域的信息或每个区域的位置进行编程?有什么办法可以获取canvas元素中的所有元素?

PS:如果将鼠标置于该状态之外,则会出现错误,因此必须更改为其他状态。

非常感谢您

到目前为止,我的代码:

import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600')
driver = webdriver.Chrome('../Downloads/chromedriver', options=options)

driver.get('https://childcaredeserts.org/2018/index.html?state=WA');
action = ActionChains(driver); 
time.sleep(20)

canvas = driver.find_element_by_css_selector("canvas.mapboxgl-canvas")
action.move_to_element(canvas).click().perform()#hover over the center of the canvas element
box=driver.find_element_by_xpath('//*[@id="root"]/div/div[2]')
#print(box.text)

res = []

res.append(box.text)

x = -340
y = -130
#hover on a place that is 340 horizontal unities and 130 vertical units far from the previous point
action.move_by_offset(x, y).click().perform()
box=driver.find_element_by_xpath('//*[@id="root"]/div/div[2]')
#print(box.text)
res.append(box.text)

print(res)

我的输出:

['Census Tract 9612\nChelan County\nChild Care Desert\nLicensed child care providers: 0\nFamily child care homes: 0\nTotal child care capacity: 0\nTotal population: 4682\nPopulation under age 5: 219\nMedian family income: $60,788\nPercent of children with all parents in the labor force: 80%\nMaternal labor force participation: 76%\nPercent non-Hispanic, white: 69%\nPercent non-Hispanic, black/African American: 0%\nPercent Hispanic/Latino: 27%\nChildren per licensed child care slot: No licensed child care providers', 'Census Tract 9901\nClallam County\n  Licensed child care providers: 0\nFamily child care homes: 0\nTotal child care capacity: 0\nTotal population: 0\nPopulation under age 5: 0\nMedian family income: $0\nPercent of children with all parents in the labor force: 0%\nMaternal labor force participation: 0%\nPercent non-Hispanic, white: 0%\nPercent non-Hispanic, black/African American: 0%\nPercent Hispanic/Latino: 0%\nChildren per licensed child care 
slot: No licensed child care providers\n\n* Sample size is too small to estimate the median family income']

0 个答案:

没有答案