在我的小应用程序中使用框架“Jsoup”下载html,但问题是我的代码不适用于某些网址。这是我的代码:
http://www.topix.com
http://www.wittyfeed.com
http://www.wittyfeed.com...
并且不使用某些网址:
http://www.google.com, http://www.amazon.es
但与其他人合作:org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590),
org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540),
org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227),
Practica1.prueba.main(prueba.java:34)
...
错误是
{{1}}
这种行为会出现什么问题?
答案 0 :(得分:2)
首先,您需要打印尝试连接到URL时获得的异常
是
http://www.topix.comorg.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://www.topix.com
所以请添加如下的用户代理
Connection conn = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
对您的代码进行了更改
import java.io.IOException;
import org.jsoup.Connection;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;
public class JsonExample {
public static void main(String[] args) {
String html=null;
//Descargamos el html
String url = "http://www.topix.com";
Connection conn = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
try {
Response resp = conn.execute();
if (resp.statusCode() != 200) {
System.out.println("Error: "+resp.statusCode());
}else{
System.out.println(Thread.currentThread().getName()+" is downloading "+ url);
//html = conn.get().html();
}
}catch(IOException e) {
System.out.println(e.getStackTrace());
System.out.println(Thread.currentThread().getName()+"No puedo conectar con "+ url + e);
System.out.println("No se puede conectar");
}
}
}
答案 1 :(得分:0)
Elements link = doc.select("a");
System.out.println(link.size());
int c=0;
String[] prices = new String[link.size()];
for (int i = 0; i < link.size(); i++) {
prices[i] = link.get(i).attr("href");
if(prices[i].contains("https")){
c++;
String nurl = prices[i].replace("%2B","+");
String surl = nurl.replace("%3D","=");
String urll=prices[i];
System.out.println(prices[i]);
URLEncoder.encode(prices[i], "UTF-8");
System.out.println(c+"\t"+surl);
// Connection connection2 = Jsoup.connect(surl);
// Response doc2=connection2.execute();
Document doc3 = Jsoup.connect(surl).post();
//Document doc3=Jsoup.connect(makeSearch).get();
String blk=doc3.html();
答案 2 :(得分:0)
对我来说,将用户代理更改为“Mozilla/5.0”就解决了这个问题。
Document doc = Jsoup.connect(url)
.userAgent("Mozilla/5.0")
.timeout(30000)
.get();