如何使用httpclient 4.3.x从ajax网页抓取信息

时间:2014-11-24 11:10:49

标签: java httpclient

例如,从https://play.google.com/store/apps

获取完整内容

我发现它发布数据:

“开始= 15试验#= 5&安培; numChildren的= 10pagTok = CA8QDxjh2ND3psHQ4pcB%3AS%3AANO1ljLBy5U&安培; IPF = 1&安培; XHR = 1”

显示下一个浏览器页面

然后我用了
  HttpPost httpPost = new HttpPost("https://play.google.com/store/apps");
  List<NameValuePair> params = new ArrayList<NameValuePair>();
    params.add(new BasicNameValuePair("start","15"));
    params.add(new BasicNameValuePair("num","5"));
    params.add(new BasicNameValuePair("numChildren","10"));
    params.add(new BasicNameValuePair("pagTok","CA8QDxjh2ND3psHQ4pcB"));
    params.add(new BasicNameValuePair("ipf","1"));
    params.add(new BasicNameValuePair("xhr","1"));

    httpPost.setEntity(new UrlEncodedFormEntity(params, "UTF-8"));
    CloseableHttpResponse response = getSSLHttpClient().execute(httpPost);
    HttpEntity entity = response.getEntity();
    try {
        if(entity != null) {
            return EntityUtils.toString(entity);
        }
    } finally {
        EntityUtils.consume(entity);
        response.close();
    }

但最终我无法从googleplay获取网络文档,结果是一些javascript, 怎么了?

1 个答案:

答案 0 :(得分:0)

Apache HTTP Client用于获取页面内容。如果您需要在此页面上执行javascript,可以使用HtmlUnitHere就是一个例子