最近,我正在尝试使用jsoup来解析网页。我有这段代码连接到url:
page.url = "https://admin.xosn.com/pdf9/3876515.pdf?DB_OEM_ID=31000";
Connection conn = Jsoup.connect(page.url);
Document htmlDocument = conn.get();
this.htmlDocument = htmlDocument;
if(!conn.response().contentType().contains("text/html")) {
System.out.println("**Failure**\nRetrieved something other than HTML");
return false;
}
我得到了错误:
Exception in thread "main" java.lang.IllegalArgumentException: Must supply a valid URL
at org.jsoup.helper.Validate.notEmpty(Validate.java:102)
at org.jsoup.helper.HttpConnection.url(HttpConnection.java:74)
at org.jsoup.helper.HttpConnection.connect(HttpConnection.java:38)
at org.jsoup.Jsoup.connect(Jsoup.java:73)
它似乎在浏览器中工作。我不知道为什么它不适用于jsoup。
答案 0 :(得分:0)
Jsoup是HTML Parser,它不能解析Pdf,你可以在使用HttpUrlConnection在jsoup中解析之前验证你的url
String url4e = "https://admin.xosn.com/pdf9/3876515.pdf?DB_OEM_ID=31000";
URL url1 = new URL(url4e);
HttpURLConnection conn = (HttpURLConnection) url1.openConnection();
conn.setRequestMethod("GET");
conn.connect();
System.out.println(conn.getContentType());