ruby selenium web driver - 获取谷歌知识图表内容

时间:2015-06-16 15:35:40

标签: javascript css ruby selenium web-scraping

我正在使用ruby selenium网络驱动程序并尝试从<div class="xpdopen">

中的第一个Google搜索结果页上的搜索结果中获取位于右上角网站的Google知识图的内容
@driver = Selenium::WebDriver.for :phantomjs
@driver.manage.timeouts.implicit_wait = 10
@driver.get "http://google.com"
element = @driver.find_element :name => "q"
element.send_keys "BMW"
element.submit
content = @driver.find_element(:class, 'xpdopen')

但是selenium找不到这个元素并且会出现错误

#<Selenium::WebDriver::Error::NoSuchElementError: {"errorMessage":"Unable to find element with class name 'xpdopen'"

当我尝试使用chrome js控制台$('.xpdopen')时,它立即发现了这个元素

我也试过

@driver.execute_script("return document.getElementsByClassName('xpdopen');")

但是找不到这个元素

我还将@driver.page_source<div class="xpdopen">绑定在页面源代码中,但我可以在Chrome控制台中看到它。为什么呢?

如何用硒获得这个元素?

以下是我从pry获得的结果:

[21] pry(main)> @driver = Selenium::WebDriver.for :phantomjs
=> #<Selenium::WebDriver::Driver:0x..f822d288ec7f0a708 browser=:phantomjs>
[22] pry(main)> @driver.manage.timeouts.implicit_wait = 10    
=> 10
[23] pry(main)> @driver.get "http://google.com"    
=> {}
[24] pry(main)> element = @driver.find_element :name => "q"    
=> #<Selenium::WebDriver::Element:0x..f389f4a8876f601e id=":wdc:1434526425103">
[25] pry(main)> element.send_keys "BMW"    
=> nil
[26] pry(main)> element.submit    
=> {}
[27] pry(main)> sleep 10    
=> 10
[28] pry(main)> content = @driver.find_element(:xpath, '//*[@id="rhs_block"]/ol/li/div[1]/div')    
Selenium::WebDriver::Error::NoSuchElementError: {"errorMessage":"Unable to find element with xpath '//*[@id=\"rhs_block\"]/ol/li/div[1]/div'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"gzip;q=1.0,deflate;q=0.6,identity;q=0.3","Connection":"close","Content-Length":"67","Content-Type":"application/json; charset=utf-8","Host":"127.0.0.1:8929","User-Agent":"Ruby"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"xpath\",\"value\":\"//*[@id=\\\"rhs_block\\\"]/ol/li/div[1]/div\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/2f3cf350-14c3-11e5-9f8e-4173e8049986/element"}} (org.openqa.selenium.NoSuchElementException)

[29] pry(main)> content = @driver.find_element(:css, "#rhs_block > ol > li > div.kp-blk._Jw._Rqb._RJe > .xpdopen")
Selenium::WebDriver::Error::NoSuchElementError: {"errorMessage":"Unable to find element with css selector '#rhs_block > ol > li > div.kp-blk._Jw._Rqb._RJe > .xpdopen'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"gzip;q=1.0,deflate;q=0.6,identity;q=0.3","Connection":"close","Content-Length":"113","Content-Type":"application/json; charset=utf-8","Host":"127.0.0.1:8929","User-Agent":"Ruby"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"css selector\",\"value\":\"#rhs_block \\u003e ol \\u003e li \\u003e div.kp-blk._Jw._Rqb._RJe \\u003e .xpdopen\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/2f3cf350-14c3-11e5-9f8e-4173e8049986/element"}} (org.openqa.selenium.NoSuchElementException)

只是为了证明它在同一页面上找到其他元素没有问题:

[30] pry(main)> results = @driver.find_elements(:xpath, "//p/a") 
=> [#<Selenium::WebDriver::Element:0x6f6a74631e2b7010 id=":wdc:1434527087873">,
 #<Selenium::WebDriver::Element:0x7b6d276448081688 id=":wdc:1434527087874">,
 #<Selenium::WebDriver::Element:0x..f9504a4171b03970a id=":wdc:1434527087875">,
 #<Selenium::WebDriver::Element:0x..fa6e0158aa8d24e2a id=":wdc:1434527087876">,
 #<Selenium::WebDriver::Element:0x327bf842e4399368 id=":wdc:1434527087877">,
 #<Selenium::WebDriver::Element:0x..fae292d7ca211ab32 id=":wdc:1434527087878">,
 #<Selenium::WebDriver::Element:0x129a58eb5ed6ee9c id=":wdc:1434527087879">,
 #<Selenium::WebDriver::Element:0x46ef3b45800e63e0 id=":wdc:1434527087880">,
 #<Selenium::WebDriver::Element:0x26bfb47f8ad498ea id=":wdc:1434527087881">,
 #<Selenium::WebDriver::Element:0x..f03756c2924a2974 id=":wdc:1434527087882">,
 #<Selenium::WebDriver::Element:0xfba93aab4b32af8 id=":wdc:1434527087883">]

我拍了截图,发现phantomjs没有显示(没有内容)知识图

来自phantomjs的屏幕截图

phantomjs page content

Firefox的屏幕截图 Firefox page content

为什么phantomjs没有内容知识图?

1 个答案:

答案 0 :(得分:0)

显然css不知道自己在哪里找到xpdopen类,你必须给出该元素的完整路径:

Xpath的:

content = @driver.find_element(:xpath, "//*[@id="rhs_block"]/ol/li/div[1]/div")

的CSS:

content = @driver.find_element(:css, "#rhs_block > ol > li > div.kp-blk._Jw._Rqb._RJe > .xpdopen")