使用HttpURLconnection读取网站的页面源代码

时间:2014-01-25 11:37:59

标签: java cookies httpurlconnection

我试图以每次打开具有不同ID的网站的方式阅读网站的页面源。 我设法阅读5-6页,但之后我阅读了服务通知页面:“请激活浏览器cookie以查看此网站” 我知道我需要以某种方式管理cookie,但我尝试的任何方式都不起作用。

这是我的代码:

public void read_and_save_pages() {

    for (String id : ids) {
        try {

            // open url
            URL url = new URL(link + id);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();

            // set user agent
            connection
                    .setRequestProperty(
                            "User-Agent",
                            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36");

            // read page source code
            BufferedReader in = new BufferedReader(new InputStreamReader(
                    connection.getInputStream(), "windows-1255"));

            // create file to write
            FileWriter fstream = new FileWriter(
                    path + ".html");
            BufferedWriter out = new BufferedWriter(fstream);

            // write file
            String line = in.readLine();
            while (line != null) {
                out.write(line + '\n');
                line = in.readLine();
            }
            out.close();

        } catch (Exception e) {// Catch exception if any
            System.err.println("Error: " + e.getMessage());
        }

    }
}

0 个答案:

没有答案