我使用Selenium创建了一个Web爬虫(使用selenium-server-standalone-2.47.1.jar)和phantomJs(phantomjs -v在Ubuntu 14.04上返回1.9.0)。代码在Windows 10上与FirefoxDriver和PhantomJSDriver一起工作正常,但只适用于Ubuntu 14.04下的FirefoxDriver。
以下示例代码:
public static void main(String[] args) {
DesiredCapabilities DesireCaps = new DesiredCapabilities();
DesireCaps.setCapability("phantomjs.binary.path", "/usr/lib/phantomjs/phantomjs");
WebDriver driver=new PhantomJSDriver(DesireCaps);
String Url = "https://xxx";
driver.get(Url);
WebElement rootWebElement = driver.findElement(By.id("main"));
List<WebElement> parentElements = rootWebElement.findElements(By.tagName("li"));
//243 , 240 (previous)
for (int i = 106; i < parentElements.size(); i++) {
WebElement href =parentElements.get(i).findElement(By.tagName("z"));
if(href!=null){
Scanner scanner = new Scanner(href.getAttribute("href"));
try {
scanner.parseXML(href.getAttribute("href"));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
如果您打开提供的网址源....您可以很容易地看到id =“main”的标记存在...
堆栈追踪:
PhantomJS is launching GhostDriver... [INFO - 2015-08-13T14:15:57.720Z] GhostDriver - Main - running on port 8677 [INFO - 2015-08-13T14:15:58.361Z] Session [d17a3cc0-41c5-11e5-bedb-6fa39763a2c0] - CONSTRUCTOR - Desired Capabilities: {"phantomjs.binary.path":"/usr/lib/phantomjs/phantomjs"} [INFO - 2015-08-13T14:15:58.370Z] Session [d17a3cc0-41c5-11e5-bedb-6fa39763a2c0] - CONSTRUCTOR - Negotiated Capabilities: {"browserName":"phantomjs","version":"1.9.0","driverName":"ghostdriver","driverVersion":"1.0.3","platform":"linux-unknown-32bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}} [INFO - 2015-08-13T14:15:58.371Z] SessionManagerReqHand - _postNewSessionCommand - New Session Created: d17a3cc0-41c5-11e5-bedb-6fa39763a2c0 Exception in thread "main" org.openqa.selenium.NoSuchElementException: Error Message => 'Unable to find element with id 'main'' caused by Request => {"headers":{"Accept-Encoding":"gzip,deflate","Connection":"Keep-Alive","Content-Length":"29","Content-Type":"application/json; charset=utf-8","Host":"localhost:8677","User-Agent":"Apache-HttpClient/4.4.1 (Java/1.7.0_79)"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"id\",\"value\":\"main\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d17a3cc0-41c5-11e5-bedb-6fa39763a2c0/element"} Command duration or timeout: 281 milliseconds For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html Build info: version: '2.47.1', revision: '411b314', time: '2015-07-30 03:03:16' System info: host: 'Vmbox', ip: '127.0.1.1', os.name: 'Linux', os.arch: 'i386', os.version: '3.19.0-25-generic', java.version: '1.7.0_79' *** Element info: {Using=id, value=main} Session ID: d17a3cc0-41c5-11e5-bedb-6fa39763a2c0 Driver info: org.openqa.selenium.phantomjs.PhantomJSDriver Capabilities [{platform=LINUX, acceptSslCerts=false, javascriptEnabled=true, browserName=phantomjs, rotatable=false, driverVersion=1.0.3, locationContextEnabled=false, version=1.9.0, cssSelectorsEnabled=true, databaseEnabled=false, handlesAlerts=false, browserConnectionEnabled=false, proxy={proxyType=direct}, nativeEvents=true, webStorageEnabled=false, driverName=ghostdriver, applicationCacheEnabled=false, takesScreenshot=true}] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:206) at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:158) at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:595) at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:348) at org.openqa.selenium.remote.RemoteWebDriver.findElementById(RemoteWebDriver.java:389) at org.openqa.selenium.By$ById.findElement(By.java:215) at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:340) at LinkScanner.main(LinkScanner.java:27) Caused by: org.openqa.selenium.remote.ScreenshotException: Screen shot has been taken Build info: version: '2.47.1', revision: '411b314', time: '2015-07-30 03:03:16' System info: host: 'Vmbox', ip: '127.0.1.1', os.name: 'Linux', os.arch: 'i386', os.version: '3.19.0-25-generic', java.version: '1.7.0_79' Driver info: driver.version: RemoteWebDriver at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:138) ... 6 more Caused by: org.openqa.selenium.NoSuchElementException: Error Message => 'Unable to find element with id 'main'' caused by Request => {"headers":{"Accept-Encoding":"gzip,deflate","Connection":"Keep-Alive","Content-Length":"29","Content-Type":"application/json; charset=utf-8","Host":"localhost:8677","User-Agent":"Apache-HttpClient/4.4.1 (Java/1.7.0_79)"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"id\",\"value\":\"main\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d17a3cc0-41c5-11e5-bedb-6fa39763a2c0/element"} For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html Build info: version: '2.47.1', revision: '411b314', time: '2015-07-30 03:03:16' System info: host: 'Vmbox', ip: '127.0.1.1', os.name: 'Linux', os.arch: 'i386', os.version: '3.19.0-25-generic', java.version: '1.7.0_79' Driver info: driver.version: unknown
答案 0 :(得分:3)
发表评论作为答案:) 您需要从https://bitbucket.org/ariya/phantomjs/downloads
安装PhantomJS 1.9.8