Java:如何使用Selenium从亚马逊抓取图像?

时间:2015-01-23 03:47:35

标签: java selenium xpath

我正在尝试使用Selenium WebDriver在亚马逊上从此网址抓取页面左侧的6张图片:

http://www.amazon.com/EasyAcc%C2%AE-10000mAh-Brilliant-Smartphone-Bluetooth/dp/B00H9BEC8E

然而,无论我尝试什么都会导致错误。到目前为止我尝试过的:

  1. 我尝试使用XPATH直接抓取图像,然后使用“getAttributes”方法提取src。例如,对于页面上的第一个图像,XPATH是:

    .//* [@ id ='a-autoid-2'] / span / input

  2. 所以我尝试了以下内容:

      String path1 = ".//*[@id='a-autoid-2']/span/input";
            String url = "http://www.amazon.com/EasyAcc%C2%AE-10000mAh-Brilliant-Smartphone-Bluetooth/dp/B00H9BEC8E";
            WebDriver driver = new FirefoxDriver();
            driver.get(url);
      WebElement s;
            s = driver.findElement(By.xpath(path1));
            String src;
            src = s.getAttribute("src");
            System.out.println(src);
    

    但我无法找到消息来源。

    注意:仅在从某些类型的产品中抓取图像时才会出现此问题。例如,我可以使用Selenium轻松地从该产品中抓取图像:

    http://www.amazon.com/Ultimate-Unification-Diet-Health-Disease/dp/0615797806/

    import java.util.List;
    
    import org.openqa.selenium.By;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.WebElement;
    import org.openqa.selenium.firefox.FirefoxDriver;
    
    public class mytest {
    
        public static void main(String[] args) {
            // TODO Auto-generated method stub
    
    
    
            String path = ".//*[@id='imgThumbs']/div[2]/img";
    
            String url = "http://www.amazon.com/Ultimate-Unification-Diet-Health-Disease/dp/0615797806/";
            WebDriver driver = new FirefoxDriver();
            driver.get(url);
    
    
            WebElement s;
            s = driver.findElement(By.xpath(path));
            String src;
            src = s.getAttribute("src");
            System.out.println(src);
    
            driver.close();
    
    
        }
    }
    

    此代码完美无瑕。只有在刮擦某些产品时,似乎无法绕过它。

    1. 我尝试点击图片会导致iframe打开,但我也无法从这个iframe中抓取图像,即使切换到iframe后也是如此:

      driver.switchTo()帧(IFRAMEID);

    2. 我知道我可以使用“截图”方法,但我想知道是否有办法直接刮取图像?

      由于

1 个答案:

答案 0 :(得分:0)

试试此代码

    String path = "//div[@id='imageBlock_feature_div']//span/img";

    String url = "http://rads.stackoverflow.com/amzn/click/0615797806";
    WebDriver driver = new FirefoxDriver();
    driver.get(url);

    List<WebElement> srcs;
    srcs = driver.findElements(By.xpath(path));

    for(WebElement src : srcs) {
        System.out.println(src.getAttribute("src"));
    }

    driver.close();

结果

2015-01-23 12:36:14 [main]-[INFO] Opened url: http://rads.stackoverflow.com/amzn/click/B00H9BEC8E
http://ecx.images-amazon.com/images/I/41cOP3mFX3L._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/51YkMhRXqcL._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/51nSbXF%2BCTL._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/31s%2B31F%2BQmL._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/41FmTOJEOOL._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/41U6qpLJ07L._SX38_SY50_CR,0,0,38,50_.jpg

但是,要获取Amazon Images,我建议您尝试使用Amazon API https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html

好多了。