我需要从网页收集由于包含成人内容而无法引用的信息。在此之前我可以进一步了解之前,首先,我需要点击年龄确认按钮。就目前而言,我只对获取它的来源感兴趣。但是,最简单的解决方案不起作用。我试图在页面加载后找到年龄按钮,但我收到了这条消息:
C:\Documents and Settings\katie>python test.py
Message: {"errorMessage":"Unable to find element with class name '.enter_pl'","r
equest":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Co
nnection":"close","Content-Length":"98","Content-Type":"application/json;charset
=UTF-8","Host":"127.0.0.1:1559","User-Agent":"Python-urllib/2.7"},"httpVersion":
"1.1","method":"POST","post":"{\"using\": \"class name\", \"sessionId\": \"8ef74
610-d21e-11e5-a0c9-ede1ada579b6\", \"value\": \".enter_pl\"}","url":"/element","
urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/ele
ment","relative":"/element","port":"","host":"","password":"","user":"","userInf
o":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["
element"]},"urlOriginal":"/session/8ef74610-d21e-11e5-a0c9-ede1ada579b6/element"
}}
Screenshot: available via screen
这是代码:
#!/bin/env/python
# -*- coding: cp1250 -*-
from datetime import datetime
import time
import sys, os
from selenium.webdriver.support.wait import WebDriverWait
from selenium import webdriver
reload(sys)
sys.setdefaultencoding("cp1250")
main_page_url = "" # actual URL removed due to referencing adult content
def get_browser():
return webdriver.PhantomJS("phantomjs.exe")
try :
browser = get_browser()
wait = WebDriverWait(browser, 30)
browser.get(main_page_url)
close = browser.find_element_by_class_name('.enter_pl')
close.click()
html = browser.page_source
browser.close()
print html
except Exception, e:
print e
我也尝试使用xpath定位元素,如下所示:
#!/bin/env/python
# -*- coding: cp1250 -*-
from datetime import datetime
import time
import sys, os
from selenium.webdriver.support.wait import WebDriverWait
from selenium import webdriver
reload(sys)
sys.setdefaultencoding("cp1250")
main_page_url = "" # actual URL removed due to referencing adult content
def get_browser():
return webdriver.PhantomJS("phantomjs.exe")
try :
browser = get_browser()
wait = WebDriverWait(browser, 30)
browser.get(main_page_url)
button_age_accept = browser.find_element_by_xpath("/html/body/div[1]/div[2]/div[1]/div[1]/div[2]/button")
button_age_accept.click()
html = browser.page_source
browser.close()
print html
except Exception, e:
print e
但我也收到了这条消息......
C:\Documents and Settings\katie>python test2.py
Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/d
iv[2]/div[1]/div[1]/div[2]/button'","request":{"headers":{"Accept":"application/
json","Accept-Encoding":"identity","Connection":"close","Content-Length":"136","
Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:1655","User-Age
nt":"Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\"using\":
\"xpath\", \"sessionId\": \"63c74110-d21f-11e5-b1f9-dbb94da03942\", \"value\":
\"/html/body/div[1]/div[2]/div[1]/div[1]/div[2]/button\"}","url":"/element","url
Parsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/elemen
t","relative":"/element","port":"","host":"","password":"","user":"","userInfo":
"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["ele
ment"]},"urlOriginal":"/session/63c74110-d21f-11e5-b1f9-dbb94da03942/element"}}
Screenshot: available via screen
以下是此网页的html:http://pastie.org/private/koxyw655innytv9skcijog
-------------------------------------------- -------------------------------------------------- -------------------------------------编辑:我试图使用Chrome -------------------------------------------------- -------------------------------------------------- -------------------------------
我尝试使用chrome而不是PhantomJS做一些确切的事情,但是chrome会显示ERR_SSL_VERSION_OR_CIPHER_MISMATCH
:
#!/bin/env/python
# -*- coding: cp1250 -*-
from datetime import datetime
import time
from selenium.webdriver.common.by import By
import sys, os
from selenium.webdriver.support.wait import WebDriverWait
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
reload(sys)
sys.setdefaultencoding("cp1250")
main_page_url = "" # actual URL removed due to referencing adult content
def get_browser():
return webdriver.PhantomJS(executable_path=r'./phantomjs.exe', service_args=['--ignore-ssl-errors=true', '–ssl-protocol=any'])
def get_chrome():
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument('--ignore-certificate-errors')
return webdriver.Chrome(chrome_options=options)
try :
browser = get_chrome()
wait = WebDriverWait(browser, 100)
browser.maximize_window()
browser.get(main_page_url)
browser.maximize_window()
wait = WebDriverWait(browser, 20)
wait.until(EC.presence_of_element_located((By.XPATH, "//*[@id='confirmage']/div[2]/button")))
elemnt = browser.find_element_by_xpath(".//*[@id='confirmage']/div[2]/button")
elemnt.click();
html = browser.page_source
browser.close()
print html
with open('result.txt', 'w') as file_:
file_.write(html)
except Exception, e:
print e
with open('result.txt', 'w') as file_:
file_.write("ERROR")
答案 0 :(得分:2)
使用find_element_by_class_name()
时,不应在类名前添加点:
driver.find_element_by_class_name("enter_pl")
如果使用CSS选择器,则需要点:
driver.find_element_by_css_selector(".enter_pl")
您可能还需要 wait for this element to be visible ,然后点击:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
age = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".enter_pl")))
age.click()
另外,检查元素是否在iframe
内。如果是,您需要在搜索元素之前切换到框架的上下文:
driver.switch_to.frame("frame_name_or_id")
driver.find_element_by_class_name("enter_pl").click()
答案 1 :(得分:2)
似乎PhantomJS正在返回一个空页面源。也许如果你添加一个标志来忽略有助于的ssl错误。
def get_browser():
return webdriver.PhantomJS('phantomjs.exe', service_args=['--ignore-ssl-errors=true', '–ssl-protocol=any'])
答案 2 :(得分:0)
从我看到的情况来看,按钮上的课程是“关闭”。不是' enter_pl'
答案 3 :(得分:0)
类名前面的点代表css选择器中的class
属性。当您使用find_element_by_class_name
时,驱动程序正在寻找以dot开头的类名。您可以使用其中一种方法
browser.find_element_by_class_name('enter_pl')
# or
browser.find_element_by_css_selector('.enter_pl')
修改
您正在寻找的课程似乎是close
。尝试
browser.find_element_by_class_name('close')
# or
browser.find_element_by_css_selector('.close')
答案 4 :(得分:0)
您是否尝试使用操作?在Java中使用WebDriver
$servername = "localhost"; $username = "xxxxxxxxx"; $password = "xxxxxxxxx"; $dbname = "xxxxxxxxxxxx";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$sql_search = "SELECT * FROM `questions` WHERE `q_title` LIKE ".sql_val('%'.$_GET['search'].'%')." OR `q_answered` LIKE ".sql_val('%'.$_GET['search'].'%');
$result = $conn->query($sql_search);
$anymatches = $result->num_rows;
if ($anymatches == 0 ) {
大多数情况下,如果元素可以直接查找,那么moveToElement对我有效。
而不是绝对的xpath / html / body / div [1] / div [2] / div [1] / div [1] / div [2] /按钮我们尝试指向最短的特定工作路径// div [1] / div [2] / button,如果这个最短的xpath没有获取所需的元素然后知道,我们可以添加另一个父元素。
这里我们也可以使用Javascriptexecutor点击按钮
Actions act=new Actions(driver);
act.moveToElement(driver.findElement(By.className(".enter_pl"))).click().build().perform();
我只是尝试了更好的方法来处理它我使用的java + webdriver我希望你可以使用准确的定位器尝试相同的逻辑(因为没有提供HTML代码,并且根据问题,该类不起作用)。
谢谢你, 穆拉利
答案 5 :(得分:0)
我昨天看到这个,下面的代码适用于我,使用xpath .//* [@ id ='confirmage'] / div [2] / button
在Java中:注意(如果需要,我可以给你一个python代码)
driver.get("url");
driver.manage().window().maximize();
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.findElement(By.xpath(".//*[@id='confirmage']/div[2]/button")).click();
python中的这个脚本对我来说很好用
driver.get("you url")
driver.maximize_window()
wait = WebDriverWait(driver, 20)
# wait for the page to load
wait.until(EC.presence_of_element_located((By.XPATH, "//*[@id='confirmage']/div[2]/button")))
elemnt =driver.find_element_by_xpath(".//*[@id='confirmage']/div[2]/button")
elemnt.click();