在测试过程中,我发现某个特定网站在尝试检索时会返回HTTP 406错误代码 - “Not Acceptable”。网址为http://thelastword.msnbc.msn.com/_news/2012/06/07/12109716-awesome-internets-thursday-edition。
这是我的代码(我正在尝试尽我所能使其看起来像普通的浏览器请求):
sourceURL = new URL("http://thelastword.msnbc.msn.com/_news/2012/06/07/12109716-awesome-internets-thursday-edition");
final HttpURLConnection connection = (HttpURLConnection) sourceURL.openConnection();
connection.setDoInput(true);
connection.setDoOutput(true);
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/rss+xml");
connection.setRequestProperty("Accept-Charset", "ISO-8859-1,utf-8");
connection.setRequestProperty("Accept-Language", "en-US,en");
connection.setRequestProperty("Accept-Encoding", "gzip");
connection
.setRequestProperty("User-Agent",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21");
connection.setRequestProperty("Host",
sourceURL.getHost() + (sourceURL.getPort() != -1 ? ":" + sourceURL.getPort() : ""));
System.out.println("Response code: "+connection.getResponseCode());
为什么这个Web服务器会出现此错误?显然,Web服务器是Apache 2.2.16。
编辑:当我注释掉这一行时,似乎有效:
connection.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/rss+xml");
但为什么?