SAX:UTF-8解码/编码失败

时间:2014-02-24 17:25:38

标签: java xml encoding utf-8 sax

我正在使用SAX进行解析,然后写入XML文件。

解析和写入过程都会破坏UTF编码。

示例XML:

<AddressInfo>
    <City name="Antalya" code="07">
      <District name="Döşemealtı">
        <Zip code="01680" />
      </District>
    </City>
<AddressInfo>

结果:

 <AddressInfo>
    <City name="Antalya" code="07">
        <District name="Döşemealtı">
            <Zip code="01680"/>
        </District>
     </City>
 <AddressInfo>

我尝试过指定输入SAXParser的InputStreamReader和InputSource, 它没有工作:

    SAXParserFactory parserFactor = SAXParserFactory.newInstance();
    SAXHandler handler = new SAXHandler();    
    SAXParser parser;
try {
      //dis is a DataInputStream
      parser = parserFactor.newSAXParser();     
      InputStreamReader inputReader = new InputStreamReader(dis, Charset.forName("UTF-8"));
      InputSource inputSource = new InputSource();
      inputSource.setCharacterStream(inputReader);
      inputSource.setEncoding("UTF-8");
    //ignoring the inputsource and using directly the DataInputStream
      parser.parse(dis, handler);   
    //also tried with inputSource, no joy
    //parser.parse(inputSource, handler);  

...

可能出现什么问题?有什么想法吗?

干杯

注意:的  输入xml没有任何声明,例如

`<?xml version="1.0" encoding="UTF-8"?>`

1 个答案:

答案 0 :(得分:0)

尝试将输入读取为字符流并使用输入源进行编码。 UTF-8需要逐字逐句阅读。 InputStream不能编码为UTF-8。

这样的事情会对你有所帮助。 如果要解析XML,请确保将clob读为Something.getcharacterStream();

Reader F;
F=clob.getcharacterStream(); (If getting clob from database make sure you are reading it as character stream)
BuffeReader Readfile = new BufferReader(F);
InputSource Encode = new InputSource(Readfile);
Encode.setEncoding("UTF-8");