我尝试使用JSOUP访问网页,但我遇到UTF-8字符问题。 这适用于没有特殊字符的ULR。
String linkTeam = "https://www.fifaindex.com/de/team/" + team.getId() + "/" + URLEncoder.encode(team.getName().replaceAll(" ", ""),"UTF-8");
System.out.println(linkTeam);
String name = URLEncoder.encode(team.getName().replaceAll(" ", ""), "UTF-8");
System.out.println(name);
Document doc = Jsoup.connect(linkTeam).get();
Elements strength = doc.getElementsByClass("badge r3");
team.setSturm(Integer.parseInt(strength.get(0).text()));
team.setMittelfeld(Integer.parseInt(strength.get(1).text()));
team.setAbwehr(Integer.parseInt(strength.get(2).text()));
return team;
这是具有UTF-8字符的URL的输出:
https://www.fifaindex.com/de/team/1877/BocaJuniors
BocaJuniors
https://www.fifaindex.com/de/team/110395/Lan%C3%BAs
Lan%C3%BAs
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error
fetching URL. Status=404,
URL=https://www.fifaindex.com/de/team/110395/Lan%25C3%25BAs
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:679)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:628)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:260)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:249)
at fifa.scraper.Scraper.getTeamStrength(Scraper.java:71)
at fifa.scraper.Scraper.loadTeams(Scraper.java:60)
at fifa.scraper.Scraper.main(Scraper.java:23)
当我使用Jsoup.connect时,有" 25"添加到URL。当我通过System.out.println()打印URL时,URL可以正常工作。如果第一行没有URLEncoder,则输出为:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=https://www.fifaindex.com/de/team/110395/Lan%2525C3%2525BAs/
那么如何连接到UTF-8字符的URL? 谢谢你的帮助。
答案 0 :(得分:1)
在您的情况下,服务器响应301状态和位置标头的问题已经包含已编码的URL,但Jsoup再次对其进行编码。下面的代码片段适用于我
private static Document sendRequest(String url) {
Document doc = null;
try {
Connection connect = Jsoup.connect(url);
connect.request().followRedirects(false);
URI u = new URI(url);
doc = connect.url(new URI(u.getScheme(), u.getUserInfo(), u.getHost(), u.getPort(), URLDecoder.decode(u.getPath(), "UTF-8"), u.getQuery(), u.getFragment()).toURL()).get();
if (connect.response().statusCode() == 301 && connect.response().header("Location") != null) {
return sendRequest(connect.response().header("Location"));
}
return doc;
} catch (IOException e) {
e.printStackTrace();
} catch (URISyntaxException e) {
e.printStackTrace();
}
return doc;
}
public static void main(String[] args) {
String url = null;
url = "https://www.fifaindex.com/de/team/110395/Lanús";
// url = "https://www.fifaindex.com/de/team/1877/BocaJuniors";
Document doc = sendRequest(url);
Elements strength = doc.getElementsByClass("badge r3");
System.out.println(Integer.parseInt(strength.get(0).text()));
System.out.println(Integer.parseInt(strength.get(1).text()));
System.out.println(Integer.parseInt(strength.get(2).text()));
}