我创建了一个使用selenium的网络爬虫,用于需要模拟用户交互的研究项目。我目前正在运行带有自定义Chromium二进制文件的webdriver的8个单独实例(进程)。
现在的主要问题是网络驱动程序有时(非确定性地)启动铬然后冻结数据;始终打开的页面。即它永远不会导航到我想要的特定URL。你可以看到发生了什么的截图。
在约100个网址被抓取后会发生这种情况。我无法在StackOverflow上找到解决方案:(。任何帮助将不胜感激:D。
Screenshot of Chromium Instance frozen on the data; page
编辑:此外,我不知道这是否会影响它,但我的网络驱动程序模拟移动设备。因此,我手动将Chromium窗口的大小设置为移动设备屏幕的大小。我有时会收到下面描述的错误。Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: org.openqa.selenium.WebDriverException: unknown error: Cannot read property 'getCurrent' of undefined
JavaScript stack:
TypeError: Cannot read property 'getCurrent' of undefined
at updateWindow (chrome-extension://aapnijgdinlhnhlmodcfapnahmbfebeb/background.js:66:17)
at eval (eval at executeAsyncScript (unknown source), <anonymous>:2:23)
at executeAsyncScript (<anonymous>:321:26)
at apply.height (<anonymous>:337:29)
at callFunction (<anonymous>:229:33)
at apply.height (<anonymous>:239:23)
at <anonymous>:240:3
at Object.InjectedScript._evaluateOn (<anonymous>:878:140)
at Object.InjectedScript._evaluateAndWrap (<anonymous>:811:34)
at Object.InjectedScript.evaluate (<anonymous>:667:21)
(Session info: chrome=49.0.2594.0)
(Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Linux 3.19.0-25-generic x86_64) (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 893 milliseconds
Build info: version: 'unknown', revision: 'unknown', time: 'unknown'
System info: host: 'momalad-virtual-machine', ip: '127.0.1.1', os.name: 'Linux', os.arch: 'amd64', os.version: '3.19.0-25-generic', java.version: '1.8.0_72-internal'
Driver info: org.openqa.selenium.chrome.ChromeDriver
Capabilities [{applicationCacheEnabled=false, rotatable=false, mobileEmulationEnabled=true, chrome={userDataDir=/tmp/.com.google.Chrome.ftGDsH}, takesHeapSnapshot=true, databaseEnabled=false, handlesAlerts=true, hasTouchScreen=true, version=49.0.2594.0, platform=LINUX, browserConnectionEnabled=false, nativeEvents=true, acceptSslCerts=true, locationContextEnabled=true, webStorageEnabled=true, browserName=chrome, takesScreenshot=true, javascriptEnabled=true, cssSelectorsEnabled=true}]
Session ID: dbc9347c26adc9507436d38f3c2bbd58
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:206)
at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:158)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:678)
at org.openqa.selenium.remote.RemoteWebDriver$RemoteWebDriverOptions$RemoteWindow.setSize(RemoteWebDriver.java:891)
at src.com.chrome.TestChromeDriver.setupPosition(TestChromeDriver.java:160)
at src.com.chrome.TestChromeDriver.setupDriver(TestChromeDriver.java:155)
at src.com.chrome.TestChromeDriver.crawl(TestChromeDriver.java:256)
at src.com.chrome.TestChromeDriver.main(TestChromeDriver.java:113)
... 5 more
编辑2:为清楚起见,我按要求添加了我的网络驱动程序设置代码:
/**
* Sets the browser in mobile mode (Nexus 5)
* Sets the timeout for pages that don't load to 15 seconds
*/
public void initBrowser(){
// Optional, if not specified, WebDriver will search your path for chromedriver.
// IF it can't be found, it will needed to be moved to the local directory of this java project
System.setProperty("webdriver.chrome.driver", "/mnt/crawl/mobmalad/ChromeAdCrawler/Dependencies/chromedriver");
setupDriver();
}
private void setupDriver(){
ChromeOptions opts = new ChromeOptions();
opts.setBinary("/mnt/crawl/mobmalad/ChromeAdCrawler/Dependencies/Release/chrome");
opts.addArguments("--no-sandbox");
Map<String, String> mobileEmulation = new HashMap<String, String>();
mobileEmulation.put("deviceName", "Google Nexus 5");
opts.setExperimentalOption("mobileEmulation", mobileEmulation);
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
capabilities.setCapability(ChromeOptions.CAPABILITY, opts);
driver = new ChromeDriver(capabilities);
driver.manage().timeouts().pageLoadTimeout(15,TimeUnit.SECONDS);
driver.manage().timeouts().implicitlyWait(1, TimeUnit.SECONDS);
setupPosition();
}
private void setupPosition(){
//Determine the desired height and width of the window and store it as a dimension
int desiredHeight = 728;
int desiredWidth = 414; //Half the screen width
Dimension desiredSize = new Dimension(desiredWidth, desiredHeight);
//Set the size of the window to the dimension determined above
driver.manage().window().setSize(desiredSize);
if(crawlerID<(crawlers/2)){
driver.manage().window().setPosition(new Point((int) (crawlerID*desiredWidth*.6), 0));
}else{
driver.manage().window().setPosition(new Point((int) ((crawlerID-(crawlers/2))*desiredWidth*.6), (int) (desiredHeight)));
}
}