Java - HTML客户端在完整页面加载之前返回响应

时间:2018-04-21 08:23:33

标签: java httpclient httpresponse

我必须从网页上阅读某个字段的内容。有人告诉我,我需要获取整个页面,然后从html内容中提取文本。 我正在使用以下程序来获取所需的页面html内容。 现在问题是这个网页需要几秒钟来加载我想要读取的实际文本值,即使之前加载了其余的静态页面组件。并且我的程序在加载静态组件之后但在加载我的值之前返回html内容。因此,我得到的最终HTML具有页面加载过程pic而不是实际值。 任何人都可以指导我对该程序所需的更改,以帮助它等到页面完全加载?

HttpPost post = new HttpPost("https://..../login");

    //prepare get method
    HttpGet httpget = new HttpGet("https://...../value#/123");

    // add parameters to the post method
    List<NameValuePair> parameters = new ArrayList<NameValuePair>();
    parameters.add(new BasicNameValuePair("username", "<name>"));
    parameters.add(new BasicNameValuePair("password", "<password>"));
    try {
        UrlEncodedFormEntity sendEntity = new UrlEncodedFormEntity(parameters, HTTP.DEF_CONTENT_CHARSET);
        post.setEntity(sendEntity);

        // create the client and execute the post method
        HttpClient client = HttpClientBuilder.create().build();

        HttpResponse postResponse = client.execute(post);
        System.out.println("Statusline: " + postResponse.getStatusLine());


        //Output the Response from the POST
        System.out.println(getStringFromInputStream(postResponse.getEntity().getContent()));

        //releasing POST
        EntityUtils.consume(postResponse.getEntity());

        //Execute get
        HttpContext context = new BasicHttpContext();

        HttpResponse getResponse = client.execute(httpget);//, context);
        System.out.println("Statusline: " + getResponse.getStatusLine());

        if (getResponse.getStatusLine().getStatusCode() != HttpStatus.SC_OK)
            throw new IOException(getResponse.getStatusLine().toString());

        System.out.print(getStringFromInputStream(getResponse.getEntity().getContent()));

1 个答案:

答案 0 :(得分:-2)

你也可以使用Jsoup库 访问http://jsoup.org