我必须在google中获取第一个搜索结果的html页面。
为此,我使用谷歌“我很幸运”,所以基本上将& btnI 添加到搜索查询网址。
例如 - http://www.google.com/search?q=%22movie%22+site:amazon.com&btnI重定向到amazon.com上的电影相关页面
让它成为我们的searchQuery;
searchQuery = "http://www.google.com/search?q=%22movie%22+site:amazon.com&btnI";
URL url = new URL(searchQuery);
InputStream response = url.openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(response));
for (String line; (line = reader.readLine()) != null;) {
System.out.println(line);
}
reader.close();
我得到了
错误:服务器返回HTTP响应代码:403为网址:http://www.google.com/search?q=%22movie%22+site:amazon.com&btnI
如果有更好的方法,也需要一些帮助..请告诉我!!
答案 0 :(得分:1)
尝试使用HttpURLConnection。
然后#setFollowRedirects(true)
并为Firefox或IE设置用户代理。
像这样:
URLConnection connection = new URL(searchQuery).openConnection();
connection.setFollowRedirects(true);
connection.setRequestProperty("User-Agent",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2");
connection.connect();
InputStream response = connection.getInputStream();
...