我想抓取google playstore搜索结果的完全呈现的网页。
完全渲染的页面包含所有搜索的项目,而未渲染的页面只有20个项目。 (请参阅https://play.google.com/store/search?q=best&c=apps&hl=en)
我尝试使用selenium抓取页面,但收到了以下错误消息。
Traceback (most recent call last):
File "play_test_2.py", line 25, in test_play_test2
driver.find_element_by_id("show-more-button").click()
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webelement.py", line 65, in click
self._execute(Command.CLICK_ELEMENT)
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webelement.py", line 385, in _execute
return self._parent.execute(command, params)
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 173, in execute
self.error_handler.check_response(response)
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 166, in check_response
raise exception_class(message, screen, stacktrace)
ElementNotVisibleException: Message: Element is not currently visible and so may not be interacted with
Stacktrace:
at fxdriver.preconditions.visible (file:///var/folders/8_/n90htn1d0_j4h7l9yt04chl80000gn/T/tmpErWdUz/extensions/fxdriver@googlecode.com/components/command-processor.js:8959:5)
at DelayedCommand.prototype.checkPreconditions_ (file:///var/folders/8_/n90htn1d0_j4h7l9yt04chl80000gn/T/tmpErWdUz/extensions/fxdriver@googlecode.com/components/command-processor.js:11618:1)
at DelayedCommand.prototype.executeInternal_/h (file:///var/folders/8_/n90htn1d0_j4h7l9yt04chl80000gn/T/tmpErWdUz/extensions/fxdriver@googlecode.com/components/command-processor.js:11635:11)
at fxdriver.Timer.prototype.setTimeout/<.notify (file:///var/folders/8_/n90htn1d0_j4h7l9yt04chl80000gn/T/tmpErWdUz/extensions/fxdriver@googlecode.com/components/command-processor.js:548:5)
波纹管代码由Selenuim IDE制作。
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import NoAlertPresentException
import unittest, time, re
class PlayTest2(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.implicitly_wait(30)
self.base_url = "https://play.google.com/"
self.verificationErrors = []
self.accept_next_alert = True
def test_play_test2(self):
driver = self.driver
driver.get(self.base_url + "/store/search?q=best&c=apps")
driver.find_element_by_id("gbqfb").click()
driver.find_element_by_id("show-more-button").click()
driver.find_element_by_id("show-more-button").click()
def is_element_present(self, how, what):
try: self.driver.find_element(by=how, value=what)
except NoSuchElementException, e: return False
return True
def is_alert_present(self):
try: self.driver.switch_to_alert()
except NoAlertPresentException, e: return False
return True
def close_alert_and_get_its_text(self):
try:
alert = self.driver.switch_to_alert()
alert_text = alert.text
if self.accept_next_alert:
alert.accept()
else:
alert.dismiss()
return alert_text
finally: self.accept_next_alert = True
def tearDown(self):
self.driver.quit()
self.assertEqual([], self.verificationErrors)
if __name__ == "__main__":
unittest.main()
我认为发生错误的原因是Google商店搜索结果页面显示了&#34;显示更多&#34;按钮在满足某些特定条件时显示,例如向下和向上滚动,然后再向下滚动。
如何解决此问题并抓取Google搜索结果页?
答案 0 :(得分:0)
test_play_test2调用
driver.find_element_by_id(&#34;显示-更多按钮&#34)。单击()
<强>两次即可。
我想在第一次点击后,selenium找不到相同的元素并且失败了。
只需从此方法中删除最后一行。
更新:你绝对正确,只有在页面滚动后才会出现按钮的问题。因此我们必须滚动窗口。 以下javascript命令很有用
((JavascriptExecutor)驱动程序).executeScript(&#34; window.scrollTo(0,document.body.scrollHeight);&#34;);
我们已多次执行一项命令:
for (int i=0; i<5; i++ ){
((JavascriptExecutor) driver).executeScript("window.scrollTo(0,document.body.scrollHeight);");
try{
(new WebDriverWait(driver, 5/*sec*/))
.until(ExpectedConditions.visibilityOf(element));
break;
}
catch (org.openqa.selenium.TimeoutException e){
}
}
if(element.isDisplayed()){
element.click();
}
抱歉,我在这个例子中使用了Java代码,但我希望这会让你有这个想法。