如何确定ByteArrayOutputStream的编码?

时间:2013-12-23 13:21:40

标签: java

我需要将ByteArrayOutputStream转换为String,但我无法弄清楚编码。请帮忙 ?我尝试使用ICUJ库,但它仅适用于输入流。从字节数组到输入流的转换也很好。

以下是我使用默认编码获得的示例。很明显,新线路不应该存在。

<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n
<html>
   \n   
   <head>
      \n        
      <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">
      \n            
      <style type=\"text/css\">\n                   .style_0 { font-family: sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 10pt; color: black; text-indent: 0em; letter-spacing: normal; word-spacing: normal; text-transform: none; white-space: normal; line-height: normal;}\n                    .style_1 { height: 5.062in; width: 8.01in;}\n           </style>
      \n            <script type=\"text/javascript\">\n          //<![CDATA[\n             function redirect(target, url){\n                   if (target =='_blank'){\n                       open(url);\n                }\n                 else if (target == '_top'){\n                       window.top.location.href=url;\n                 }\n                 else if (target == '_parent'){\n                    location.href=url;\n                }\n                 else if (target == '_self'){\n                      location.href =url;\n                   }\n                 else{\n                     open(url);\n                }\n                }\n            //]]>\n            </script>\n     
   </head>
   \n       <body class=\"style_0\" style=\" margin:0px;\">\n           <table cellpadding=\"0\" style=\"empty-cells: show; border-collapse:collapse; width:8in; overflow: hidden; table-layout:fixed;\">\n             
   <col>
   </col>\n             
   <tr>
      \n                    
      <td></td>
      \n                
   </tr>
   \n               
   <tr>
      \n                    
      <td valign=\"top\"></td>
      \n                
   </tr>
   \n               
   <tr>
      \n                    
      <td>
         \n                     
         <div style=\"overflow:hidden; height:0.5in\">\n                            <div style=\" overflow:hidden;\">Dec 23, 2013, 7:11 PM</div>
         \n                     </div>\n                    
      </td>
      \n                
   </tr>
   \n           </table>\n              
   <hr style=\"color:red\"/>
   \n               
   <div style=\"color:red\">
   \n                   
   <div>The following items have errors:\n          </div>
   \n           <br>\n                      
   <div>
   \n                           
   <div  id=\"error_title\" style=\"text-decoration:underline\">
   Chart (id = 12):

\ n

2 个答案:

答案 0 :(得分:4)

  

我尝试过使用ICUJ库但它只适用于输入流。

您可以从ByteArrayOutputStream获取字节数组,然后将其包装在ByteArrayInputStream ...中并将 传递给ICUJ方法。


(请记住,ICUJ有可能会输入错误的编码。或者字节可能不代表任何已知编码的文本。)

答案 1 :(得分:1)

java.nio.charset.CharsetDecoder有一个detectedCharset()方法可以自动识别字符编码字节的字符集。但遗憾的是,Java SE7中的CharSetDecoder的当前impl(通过调用方法Charset.newDecoder()接收的那个)不是自动检测字符集解码器,因此调用detectedCharSet()方法会抛出UnsupportedOperationException