我正在尝试学习如何通过java程序从网站下载图像和文件。以下代码是从http://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html复制的。该程序应该显示来自提供的URL的html文件代码。
从网站引用:“当您运行程序时,您应该在命令窗口中看到位于http://www.oracle.com/的HTML文件中的HTML命令和文本内容。”
我的问题是它适用于某些网站,但不适用于interfacelift.com。它不显示该网站的任何内容。我想弄明白为什么。
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://interfacelift.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
答案 0 :(得分:0)
我怀疑该网站拒绝向您发送任何内容,因为它无法识别您使用的是网络浏览器。有些网站不喜欢自动网络抓取工具,例如您的程序试图阅读其网页,因此选择阻止它们。
当我尝试使用Python发出相同的请求时,我收到403 Forbidden错误。我想你的Java应用程序会出现同样的错误:
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> urllib2.urlopen("http://interfacelift.com/").read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 406, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 444, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden