Question

我收到维基百科页面的回复并将回复粘贴到html文件中。如果我在浏览器中打开html文件，我将无法获得英语以外的语言（我使用的是UTF-8）。我正在附上html中的语言图片。

我尝试了几种方法来使用java获取响应，它们如下所示，

方式1，

    URL url = new URL ("https://en.wikipedia.org/wiki/Sachin_Tendulkar");
    byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
    //System.out.println("Host --------"+url.getHost());
    String encoding = new String (encodedBytes);

    HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    connection.setRequestMethod("GET");
    connection.setRequestProperty("Accept-Charset", "UTF-8");
    connection.setRequestProperty("Content-Type", "text/xml; charset=UTF-8");
    connection.setDoInput (true);
    connection.setRequestProperty  ("Authorization", "Basic " + encoding);
    connection.connect();

    InputStream content = (InputStream)connection.getInputStream();
    BufferedReader in   = new BufferedReader (new InputStreamReader (content));
    String line;

    while ((line = in.readLine()) != null) {
        String s = line.toString();
            System.out.println(s);
        }

我也尝试了以下代码，但这也没有显示字体维基，

            URL url;
            HttpURLConnection conn;
            BufferedReader rd;
            String line;
            StringBuilder result = new StringBuilder();
            try {
               url = new URL("https://en.wikipedia.org/wiki/Sachin_Tendulkar");
               conn = (HttpURLConnection) url.openConnection();
               conn.setRequestMethod("GET");
               conn.setRequestProperty("Accept-Charset", "UTF-8");
               conn.setRequestProperty("Content-Type", "text/xml; charset=UTF-8");

               rd = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
               while ((line = rd.readLine()) != null) {
                  byte [] b = line.getBytes("UTF-8");
                  result.append(line);
                  System.out.println(result.append(line));
               }
               rd.close();
            } catch (Exception e) {
               e.printStackTrace();
            }

Answer 1

几点：

您的代码未显示您对HTML文件的响应是如何持久的。您是否只是将流程的标准输出重定向到文件？即使在写入输出文件时也要确保使用UTF-8。
为什么System.out.println在读取循环的每次迭代中都是整个StringBuffer实例？
为什么要调用line.getBytes（）并且从不使用输出？

编辑 - 基于您的评论，我真的认为问题在于剪贴板操作。请尝试下面的代码，它将响应直接存储到输出文件中。

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;

public class HtmlDownloader {

    private static final String USER_AGENT = "Mozilla/5.0";
    private static final String ENCODING = "UTF-8";

    public boolean download(String urlAddress, String outputFileName) {
        HttpURLConnection con = null;
        BufferedInputStream is = null;
        BufferedOutputStream os = null;
        try {
            URL url = new URL(urlAddress);
            con = (HttpURLConnection) url.openConnection();
            con.setRequestMethod("GET");
            con.setRequestProperty("User-Agent", USER_AGENT);
            con.setRequestProperty("Accept-Charset", ENCODING);
            is = new BufferedInputStream(
                    con.getInputStream()
            );
            os = new BufferedOutputStream(
                    new FileOutputStream(outputFileName)
            );
            byte[] buffer = new byte[1024];
            int len;
            while ((len = is.read(buffer)) >= 0) {
                os.write(buffer, 0, len);
            }
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        } finally {
            if (is != null) {
                try {
                    is.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            if (os != null) {
                try {
                    os.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return true;
    }

    public static void main(String[] args) {
        HtmlDownloader d = new HtmlDownloader();
        if (d.download("https://en.wikipedia.org/wiki/Sachin_Tendulkar", "c:\\wiki.html"))
            System.out.println("SUCCESS");
        else
            System.out.println("FAIL");
    }
}

如何使用java获取除英语文本以外的响应

1 个答案: