如何使用Java从网站复制html div的内容

时间:2016-12-09 09:19:49

标签: java html parsing web-scraping jsoup

我试图在java中编写一个函数,它基本上会复制并粘贴来自url的div的html代码。有问题的数据来自http://cdn.espn.com/sports/scores#completed但是当使用io流复制到我的函数中时,数据是不可见的。当我点击检查并控制f" completed-soccer"时,数据本身是可见的。它显示为但我的代码根本没有检索它。这是我使用的代码。

package project;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;


public class DownloadPage {

    public static void main(String[] args) throws IOException {

        // Make a URL to the web page
        URL url = new URL("http://cdn.espn.com/sports/scores#completed-soccer");

        // Get the input stream through URL Connection
        URLConnection con = url.openConnection();
        InputStream is =con.getInputStream();


        BufferedReader br = new BufferedReader(new InputStreamReader(is));

        String line = null;

        // read each line and write to System.out
        while ((line = br.readLine()) != null) {
            System.out.println(line);
        }
}

2 个答案:

答案 0 :(得分:0)

如果您无法通过正常的HTTP请求访问数据,则必须使用更复杂的库,例如Selenium和Webdriver。

这个库允许您真正在网页中导航,执行javascript并检查所有元素。

您可以找到很多信息和指南。

答案 1 :(得分:0)

尝试使用此代码

 public static void main(String[] args) throws IOException {
        URL url = new URL("http://cdn.espn.com/sports/scores#completed-soccer");
        HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
        try
        {
            InputStream in = url.openStream();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder result = new StringBuilder();
            String line;
            while((line = reader.readLine()) != null) {
                result.append(line);
            }
            System.out.println(result.toString());
        }
        finally
        {
            urlConnection.disconnect();
        }
    }