通过selenium java下载验证码

时间:2014-08-26 09:22:05

标签: java html selenium xpath imagedownload

我正在尝试获取CAPTCHA图像。

my previous question中提及的内容。我现在可以设法填写表格并下载CAPTCHA,但它总是随机的。

所以我的代码:

package testproject;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.InputStream;
import java.net.URI;
import java.net.URL;
import java.net.URLConnection;
import java.util.List;
import java.util.regex.Pattern;
import java.util.concurrent.TimeUnit;

import javax.imageio.ImageIO;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;

import org.junit.*;

import static org.junit.Assert.*;
import static org.hamcrest.CoreMatchers.*;

import org.openqa.selenium.*;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.firefox.FirefoxProfile;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.support.ui.Select;

public class testClass {
  private WebDriver driver;
  @Before
  public void setUp() throws Exception {
      //"C:\\Users\\c.farkas\\AppData\Local\\Mozilla Firefox\\Firefox.exe
      System.setProperty("webdriver.firefox.bin","C:\\Users\\c.farkas\\AppData\\Local\\Mozilla Firefox\\Firefox.exe");
    driver = new FirefoxDriver();
  }


  @Test
  public void testtestclass() throws Exception {
      driver.get("http://tudakozo.telekom.hu/main?xml=main&xsl=main");
      driver.findElement(By.xpath("id('session_name')")).sendKeys("Szabó Gábor");
      driver.findElement(By.xpath("id('session_location')")).sendKeys("Gyula");
      System.out.println("cica");
      WebElement img = driver.findElement(By.xpath("//form[@id='searchByName']/table/tbody/tr/td/img")); // or xpath whichever you prefer
      String src = img.getAttribute("src");

   // Create a new trust manager that trust all certificates
      TrustManager[] trustAllCerts = new TrustManager[]{
          new X509TrustManager() {
              public java.security.cert.X509Certificate[] getAcceptedIssuers() {
                  return null;
              }
              public void checkClientTrusted(
                  java.security.cert.X509Certificate[] certs, String authType) {
              }
              public void checkServerTrusted(
                  java.security.cert.X509Certificate[] certs, String authType) {
              }
          }
      };

      // Activate the new trust manager
      try {
          SSLContext sc = SSLContext.getInstance("SSL");
          sc.init(null, trustAllCerts, new java.security.SecureRandom());
          HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
      } catch (Exception e) {
      }


      URL url = new URL(src);
      URLConnection connection = url.openConnection();
      InputStream is = connection.getInputStream();
      BufferedImage bufImgOne = ImageIO.read(url);
      ImageIO.write(bufImgOne, "jpg", new File("test.jpg"));

      // .. then download the file
 /*    System.out.println(src);
      URI uri = new URI(src);
      URL url = uri.toURL();
      BufferedImage bufImgOne = ImageIO.read(url);
      ImageIO.write(bufImgOne, "jph", new File("test.png"));*/
 //     System.out.println(cheesecakes.size() + " cheesecakes:");
   /*   for (int i=0; i<cheesecakes.size(); i++) {
          System.out.println(i+1 + ". " + cheesecakes.get(i).getText());
      }*/
  }

  @After
  public void tearDown() throws Exception {
    driver.quit();
    }
  }

问题部分是:

我使用以下内容下载图像:

WebElement img = driver.findElement(By.xpath("//form[@id='searchByName']/table/tbody/tr/td/img")); /

但我总是得到一张随机的CAPTCHA图像。如何下载我需要的特定图像?我可以实现元素的Selenium sceenshot吗?或截图选项卡并使用某种方法裁剪它?

CAPTCHA的网址为http://tudakozo.telekom.hu/main?xml=main&xsl=main

2 个答案:

答案 0 :(得分:3)

只需为该验证码图像提供xpath并使用selenium获取该图像的屏幕截图,我们可以选择在selenium中截取屏幕截图, WebDriver driver = new FirefoxDriver();

driver.get("URL");
File scrFile = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(scrFile, new File("c:\\tmp\\screenshot.png"));

由于这些验证码是动态的,您只想提供该验证码的xpath并保存屏幕截图。

回来吧,如果你有疑问的话。 快乐的编码:)

答案 1 :(得分:1)

验证码系统的目的是准确地防止您显然想要做的事情,这是验证码图像的自动解释。

话虽如此,如果您想截取页面的屏幕截图,请参阅this page of the official documentation