从android中的特定网站抓取html

时间:2013-01-24 18:02:34

标签: java android httpclient

我试过抓取这个webapge http://www.mindef.gov.sg/content/imindef/press_room/official_releases/nr/2013/jan/22jan13_nr.html的html源代码。但是我输入了错误,因为它与我从浏览器中看到的相比,使用了不同类型的html。看起来像对网络和应用程序进行httopost导致不同类型的响应

address="http://www.mindef.gov.sg/content/imindef/press_room/official_releases/nr/2013/jan/22jan13_nr.html";
        String result = "";
        HttpClient httpclient = new DefaultHttpClient();

    //  httpclient.getParams().setParameter("http.protocol.single-cookie-header", true);
        HttpProtocolParams.setUserAgent(httpclient.getParams(),  "Mozilla/5.0 (Linux; U; Android 2.2.1; en-ca; LG-P505R Build/FRG83) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1");           
InputStream is = null;
HttpGet httpGet = new HttpGet (address);

                HttpResponse response = httpclient.execute(httpGet);
                HttpEntity entity = response.getEntity();
                is = entity.getContent();
            InputStream is = null;

3 个答案:

答案 0 :(得分:0)

尝试:

    URLConnection  cn= new URL(url).openConnection();

    BufferedReader input = new BufferedReader( new InputStreamReader( cn.getInputStream() ) );

读取输入流。

答案 1 :(得分:0)

听起来你正在获得该网站的移动版本。如果您将网址扩展为包含?siteversion=pc,则应将该网页提供给您计算机上的浏览器。

答案 2 :(得分:0)

试试这个:

StringBuilder builder = new StringBuilder();
String line = null;
HttpGet get = new HttpGet("http://www.url.com");
HttpClient client = new DefaultHttpClient();
HttpResponse response = client.execute(get);
InputStream is = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
while ((line = reader.readLine()) != null) builder.append(line);

然后,页面源应位于builder