我正在尝试使用jsoup / java根据用户输入的主题访问Google新闻文章。但是,当我尝试访问Google新闻网页时,我从此行收到运行时错误:
try {
doc = (org.jsoup.nodes.Document) Jsoup.connect("https://www.google.com/search?hl=en&gl=us&tbm=nws&authuser=0&q="+ "technology").get();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
当我执行此代码时,我收到此错误:
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=https://www.google.com/search?hl=en&gl=us&tbm=nws&authuser=0&q=technology
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216)
at newsbot.NewsBot.onUpdateReceived(NewsBot.java:93)
at org.telegram.telegrambots.updatesreceivers.BotSession$HandlerThread.run(BotSession.java:197)
但是,如果我在谷歌中键入link,我想要访问的网页完美无缺。我非常感谢你的帮助,谢谢。
答案 0 :(得分:0)
您需要包含用户代理:
Jsoup.connect("https://www.google.com/search?hl=en&gl=us&tbm=nws&authuser=0&q="+ "technology")
.userAgent("blah-blah")
.get();
答案 1 :(得分:0)
您可以包含用户代理,这样就不会禁止该页面(HTTP 403)
Document doc = (Document) Jsoup
.connect("https://www.google.com/search?hl=en&gl=us&tbm=nws&authuser=0&q=" + "technology")
.ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0").get();
System.out.println(doc);