绕过cookiewall硒

时间:2018-10-02 10:42:57

标签: python python-3.x selenium web-scraping

我想从荷兰工作清单网站上抓取工作清单。但是,当我尝试使用硒打开页面时,遇到了一个cookiewall(新的GDPR规则)。如何绕过cookiewall?

import selenium 

#launch url
url = "https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO"

# create a new Firefox session
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)

编辑我尝试过的内容

import selenium 
import pickle

url = "https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO"

driver = webdriver.Firefox()
driver.set_page_load_timeout(20)
driver.get(start_url)

pickle.dump(driver.get_cookies() , open("NVBCookies.pkl","wb"))

之后,加载cookie无效

for cookie in pickle.load(open("NVBCookies.pkl", "rb")):
    driver.add_cookie(cookie)

InvalidCookieDomainException: Message: Cookies may only be set for the current domain (cookiewall.vnumediaonline.nl)

好像我没有从cookiewall上获取cookie,对吗?

2 个答案:

答案 0 :(得分:1)

您不必编写代码来检查代码是否存在,而不是绕开它,然后接受它,否则继续下一个操作。请在下面的代码中获取更多详细信息

import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys


class PythonOrgSearch(unittest.TestCase):

    def setUp(self):

        self.driver = webdriver.Chrome(executable_path="C:\\Users\\USER\\Downloads\\New folder (2)\\chromedriver_win32\\chromedriver.exe")

    def test_search_in_python_org(self):
        driver = self.driver
        driver.get("https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO")

        elem = driver.find_element_by_xpath("//div[@class='article__button']//button[@id='form_save']")
        elem.click()

    def tearDown(self):
        self.driver.close()

if __name__ == "__main__":
    unittest.main()

答案 1 :(得分:0)

driver.find_element_by_xpath('//*[@id="form_save"]').click()

好吧,我让硒单击接受按钮。我也很好。不确定以后是否会碰到饼干墙