我试过抓取这个webapge http://www.mindef.gov.sg/content/imindef/press_room/official_releases/nr/2013/jan/22jan13_nr.html的html源代码。但是我输入了错误,因为它与我从浏览器中看到的相比,使用了不同类型的html。看起来像对网络和应用程序进行httopost导致不同类型的响应
address="http://www.mindef.gov.sg/content/imindef/press_room/official_releases/nr/2013/jan/22jan13_nr.html";
String result = "";
HttpClient httpclient = new DefaultHttpClient();
// httpclient.getParams().setParameter("http.protocol.single-cookie-header", true);
HttpProtocolParams.setUserAgent(httpclient.getParams(), "Mozilla/5.0 (Linux; U; Android 2.2.1; en-ca; LG-P505R Build/FRG83) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1");
InputStream is = null;
HttpGet httpGet = new HttpGet (address);
HttpResponse response = httpclient.execute(httpGet);
HttpEntity entity = response.getEntity();
is = entity.getContent();
InputStream is = null;
答案 0 :(得分:0)
尝试:
URLConnection cn= new URL(url).openConnection();
BufferedReader input = new BufferedReader( new InputStreamReader( cn.getInputStream() ) );
读取输入流。
答案 1 :(得分:0)
听起来你正在获得该网站的移动版本。如果您将网址扩展为包含?siteversion=pc
,则应将该网页提供给您计算机上的浏览器。
答案 2 :(得分:0)
试试这个:
StringBuilder builder = new StringBuilder();
String line = null;
HttpGet get = new HttpGet("http://www.url.com");
HttpClient client = new DefaultHttpClient();
HttpResponse response = client.execute(get);
InputStream is = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
while ((line = reader.readLine()) != null) builder.append(line);
然后,页面源应位于builder
。