Question

在我的小应用程序中使用框架“Jsoup”下载html，但问题是我的代码不适用于某些网址。这是我的代码：

http://www.topix.com
http://www.wittyfeed.com
http://www.wittyfeed.com...

并且不使用某些网址：

http://www.google.com, http://www.amazon.es

但与其他人合作：org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590), org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540), org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227), Practica1.prueba.main(prueba.java:34) ...

错误是

{{1}}

这种行为会出现什么问题？

Answer 1

首先，您需要打印尝试连接到URL时获得的异常

是

http://www.topix.comorg.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://www.topix.com

所以请添加如下的用户代理

Connection conn = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");

对您的代码进行了更改

import java.io.IOException;
import org.jsoup.Connection;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;


public class JsonExample {

    public  static void main(String[] args) {

        String html=null;

        //Descargamos el html
        String url = "http://www.topix.com";
        Connection conn = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
        try {
            Response resp = conn.execute();
            if (resp.statusCode() != 200) {
                System.out.println("Error: "+resp.statusCode());
            }else{
                System.out.println(Thread.currentThread().getName()+" is downloading "+ url);
                //html = conn.get().html();
            }   
        }catch(IOException e) {
             System.out.println(e.getStackTrace());
             System.out.println(Thread.currentThread().getName()+"No puedo conectar con  "+ url + e);
             System.out.println("No se puede conectar");
        }
    }   
}

Answer 2

Elements link = doc.select("a");
        System.out.println(link.size());
        int c=0;
        String[] prices = new String[link.size()];
        for (int i = 0; i < link.size(); i++) {
            prices[i] = link.get(i).attr("href");
            if(prices[i].contains("https")){
                c++;
                String nurl = prices[i].replace("%2B","+");
                String surl = nurl.replace("%3D","=");
                String urll=prices[i];
                System.out.println(prices[i]);
                URLEncoder.encode(prices[i], "UTF-8");
                System.out.println(c+"\t"+surl);
//              Connection connection2 = Jsoup.connect(surl);
//              Response doc2=connection2.execute();
                Document doc3 = Jsoup.connect(surl).post();
                //Document doc3=Jsoup.connect(makeSearch).get();
                String blk=doc3.html();

Answer 3

对我来说，将用户代理更改为“Mozilla/5.0”就解决了这个问题。

Document doc = Jsoup.connect(url)
                            .userAgent("Mozilla/5.0")
                            .timeout(30000)
                            .get();

为什么Jsoup无法连接某些URL？

3 个答案: