使用jsoup解析亚马逊页面返回204状态

时间:2013-04-05 09:38:34

标签: java html parsing jsoup

示例页面:http://www.amazon.com/gp/offer-listing/1589942140

public void connect( String url ) {        
    this.conn = Jsoup.connect( url );  
}

/**
 * Executes the request and parses the result.
 * @return 
 */
public boolean parse() 
{
    try {
        this.page = this.conn.get();
        return true;
    } catch (IOException ex) {
        // log it here
        System.out.format("Error: %s%n", ex);
        return false;
    }
}    

解析页面会在下面创建ioexception:

org.jsoup.HttpStatusException:HTTP错误提取URL。状态= 204,网址= http://www.amazon.com/gp/offer-listing/1589942140

我尝试使用下面的本机java url类,但它没有创建IOException:

    try {
        URL myURL = new URL("http://www.amazon.com/gp/offer-listing/1589942140");
        URLConnection myURLConnection = myURL.openConnection();
        myURLConnection.connect();
        System.out.format("%s", myURLConnection.getContentType());
    } 
    catch (MalformedURLException e) { 
        // new URL() failed
        System.out.format("Error: %s%n", e);
    } 
    catch (IOException e) {   
        // openConnection() failed
        System.out.format("Error: %s%n", e);
    }

任何想法为什么会这样?

1 个答案:

答案 0 :(得分:0)

以下适用于我:

            System.out.println(Jsoup.connect("http://www.amazon.com/gp/offer-listing/1589942140").userAgent("Mozilla").get().text());;

上面尝试的网址是您在上面指定的。 (示例页面:http://www.amazon.com/gp/offer-listing/1589942140