Question

我正在从Servlet中的Perl页面读取HTTP响应，如下所示：

public String getHTML(String urlToRead) {
        URL url;
        HttpURLConnection conn;
        BufferedReader rd;
        String line;
        String result = "";
        try {
           url = new URL(urlToRead);
           conn = (HttpURLConnection) url.openConnection();
           conn.setRequestMethod("GET");
           conn.setRequestProperty("Accept-Charset", "UTF-8");
           conn.setRequestProperty("Content-Type", "text/xml; charset=UTF-8");

           rd = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
           while ((line = rd.readLine()) != null) {
              byte [] b = line.getBytes();
              result += new String(b, "UTF-8");
           }
           rd.close();
        } catch (Exception e) {
           e.printStackTrace();
        }
        return result;
   }

我使用以下代码显示此结果：

response.setContentType("text/plain; charset=UTF-8");

        PrintWriter out = new PrintWriter(new OutputStreamWriter(response.getOutputStream(), "UTF-8"), true);


        try {

            String query = request.getParameter("query");
            String type = request.getParameter("type");

            String res = getHTML(url);
            out.write(res);

        } finally {            
            out.close();
        }

但是响应仍然没有编码为UTF-8。我做错了什么？

提前致谢。

Answer 1

对line.getBytes()的调用看起来很可疑。如果您确定返回的内容是UTF-8编码，则应该将其设为line.getBytes("UTF-8")。另外，我不确定为什么它甚至是必要的。从BufferedReader中获取数据的典型方法是使用StringBuilder继续将从String检索到的每个readLine附加到结果中。 <{1}}和String之间来回转换是不必要的。

将byte[]更改为result并执行此操作：

StringBuilder

Answer 2

您可以在这里打破字符编码转换链：

       while ((line = rd.readLine()) != null) {
          byte [] b = line.getBytes();  // NOT UTF-8
          result += new String(b, "UTF-8");
       }

From String＃getBytes（）javadoc：

使用平台将此String编码为字节序列默认字符集，将结果存储到新的字节数组

而且，defaullt charset可能不是UTF-8。

但为什么首先要进行所有转换？只需从源读取原始字节并将原始字节写入使用者。它一直都是UTF-8。

Answer 3

我在另一个场景中也遇到了同样的问题，但只要这样做我相信它会起作用：

byte[] b = line.getBytes(UTF8_CHARSET);

在while循环中

：

while ((line = rd.readLine()) != null) {
          byte [] b = line.getBytes();  // NOT UTF-8
          result += new String(b, "UTF-8");
       }

Answer 4

在我的情况下，我已添加其他配置。

以前，我是这样写的：

try (PrintStream printStream = new PrintStream(response.getOutputStream()) {
        printStream.print(pageInjecting);
}

我改为：

try (PrintStream printStream = new PrintStream(response.getOutputStream(), false, "UTF-8")) {
        printStream.print(pageInjecting);
}

使用servlet的UTF-8响应

4 个答案: