Question

您好我想使用python 3.x和beautifulsoup从带有年龄验证弹出窗口的网站上抓取数据。如果没有点击“是”，“我超过21岁”，我就无法找到基础文本和图像。感谢您的支持。

编辑：谢谢，在评论的帮助下，我看到我可以使用cookies但不知道如何使用请求包来管理/存储/调用cookie。

所以在另一个用户的帮助下，我使用的是selenium包，这样它也能用于图形覆盖（我认为？）。无法让它与gecko驱动程序一起工作，但会继续尝试！再次感谢所有人的建议。

编辑3：好的我已经取得了进展，我可以使用gecko驱动程序打开浏览器窗口！〜不幸的是它不喜欢链接规范，所以我再次发布。在年龄验证上点击“是”的链接被隐藏在该页面上，称为mlink ...

编辑4：取得了一些进展，更新后的代码如下。我设法在XML代码中找到了元素，现在我只需要设法点击链接。

#
import time
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'/Users/jeff/Documents/geckodriver') # Optional argument, if not specified will search path.
driver.get('https://www.shopharborside.com/oakland/#/shop/412');

url = 'https://www.shopharborside.com/oakland/#/shop/412'
driver.get(url)

#
driver.find_element_by_class_name('hhc_modal-body').click(Yes)

#wait.1.second
time.sleep(1)

pagesource = driver.page_source
soup = BeautifulSoup(pagesource)

#you.can.now.enjoy.soup
print(soup.prettify())

编辑新：再次粘住，这是当前的代码。我似乎已经隔离了元素“mBtnYes”但是在运行代码时出现错误： ElementClickInterceptedException：消息：元素在点（625,278.5500030517578）处不可点击，因为另一个元素遮挡了它

 import time
 import selenium
 from selenium import webdriver
 from selenium.webdriver.common.keys import Keys
 from selenium.webdriver.support.ui import WebDriverWait
 from bs4 import BeautifulSoup

 driver = webdriver.Firefox(executable_path=r'/Users/jeff/Documents/geckodriver') # Optional argument, if not specified will search path.
 driver.get('https://www.shopharborside.com/oakland/#/shop/412');

 url = 'https://www.shopharborside.com/oakland/#/shop/412'
 driver.get(url)

 #

 driver.find_element_by_id('myBtnYes').click()

 #wait.1.second
 time.sleep(1)

 pagesource = driver.page_source
 soup = BeautifulSoup(pagesource)

 #you.can.now.enjoy.soup
 print(soup.prettify())

Answer 1

如果您的目的是点击验证获取硒： ps安装selenium＆amp;＆amp;得到geckodriver（firefox）或chromedriver（chrome）

#Mossein~King(hi i'm here to help)
import time
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.options import Options
from BeautifulSoup import BeautifulSoup

#this.is.for.headless.This.will.save.you.a.bunch.of.research.time(Trust.me)
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(firefox_options=options)

#for.graphical(you.need.gecko.driver.for.firefox)
# driver = webdriver.Firefox()

url = 'your-url'
driver.get(url)

#get.the.link.to.clicking
#exaple if<a class='MosseinKing'>
driver.find_element_by_xpath("//a[@class='MosseinKing']").click()

#wait.1.secong.in.case.of.transitions
time.sleep(1)

pagesource = driver.page_source
soup = BeautifulSoup(pagesource)

#you.can.now.enjoy.soup
print soup.prettify()

网页抓取w /年龄验证

1 个答案: