如何在java中下载页面并将其保存为UTF-8?

时间:2013-04-21 10:42:19

标签: java utf-8 httpurlconnection

此函数以UTF8格式向服务器发送请求,并以UTF8格式接收:

public String downloadPage(String _url, String _reqm, String _params) {
        try {
            if (_reqm == null || (_reqm == "POST" && _params == null))
                throw new IOException();

            URL _myURL = null;

            if (_reqm == "GET") {
                _myURL = new URL(_params == null ? _url : _url + "?" + _params); //URLEncoder.encode(_params, "UTF-8")
            } else if (_reqm == "POST") {
                _myURL = new URL(_url);
            }

            HttpURLConnection pageConnection = (HttpURLConnection) _myURL.openConnection();
            pageConnection.setUseCaches(false);
            pageConnection.setDoOutput(true);
            pageConnection.setDoInput(true);
            pageConnection.setInstanceFollowRedirects(false);
            pageConnection.setRequestMethod(_reqm);

            pageConnection.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml");
            pageConnection.setRequestProperty("Accept-Charset", "UTF-8");
            pageConnection.setRequestProperty("charset", "UTF-8");
            pageConnection.setRequestProperty("Connection", "keep-alive");
            pageConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31");

            if (_reqm == "POST") {
                pageConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");

                OutputStreamWriter writer = new OutputStreamWriter(pageConnection.getOutputStream());
                writer.write(_params); //URLEncoder.encode(_params, "UTF-8")
                writer.flush();
                writer.close();
            }

            BufferedReader reader = new BufferedReader(new InputStreamReader(pageConnection.getInputStream()));
            String inputLine;
            StringBuilder text = new StringBuilder();

            while ((inputLine = reader.readLine()) != null) {
                text.append(inputLine + "\n");
            }

            reader.close();
            return text.toString();
        } catch (IOException e) {
            e.printStackTrace();
            return "ERROR";
        }
    }

此函数将UTF8字符串保存为UTF8文件:

public void writeFile(String _content, String _fileName) {
        try {
            BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(_fileName)), "UTF-8"));
            out.write(_content);
            out.flush();
            out.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

我使用这两个,就像这样:

String downloadedPage = myTelecom.downloadPage("http://www.sbrf.ru/moscow/ru/", "GET", null);
myIO.writeFile(downloadedPage, "original.html");

尽管有多种编码指示,但我无法使其正常工作。 无论请求方法或域或“Accept-Charset”或“charset”,它都不起作用。

没有俄语,没有匈牙利人看起来像他们看起来像,我不知道我在哪里犯了错误。

可能是什么问题?

0 个答案:

没有答案