3字节UTF-8序列xml转换异常的字节2无效

时间:2015-11-12 22:19:52

标签: java xml xslt encoding

我正在尝试转换一些us-ascii编码的xml文件 变压器使用utf-8 / iso-8859-1可以正常工作,但不适用于us-ascii。 我还尝试使用FileInputStream方法但不确定您是否在StreamResult部分指定了任何编码

这是我的代码:

    File xsl = new File("src/xsl/prism.xsl");
    String fname = file.getName();

    TransformerFactory factory = TransformerFactory.newInstance();
    Source xslt = new StreamSource(xsl);

    try{

        Transformer transformer = factory.newTransformer(xslt);
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
        transformer.setOutputProperty(OutputKeys.ENCODING, "us-ascii");
        Source text = new StreamSource(file.getCanonicalFile());

        System.out.println("Transformed "  + fname + "\n");
        transformer.transform(text, new StreamResult(new File(outPath + file.getName())));


    }catch (TransformerException | IOException e) {

        System.out.println("Error in: "+fname+"\n");
        e.printStackTrace();

    }

以下是抛出的异常:

   com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Invalid byte 2 of 3-byte UTF-8 sequence.
    at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:464)
    at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:252)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:565)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:748)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:359)
    at com.rogers.ramraja.XSLT.transform(XSLT.java:66)
    at com.rogers.ramraja.XSLT.main(XSLT.java:41)
---------
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:687)
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:408)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1728)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1400)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2823)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:458)
    at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:252)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:565)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:748)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:359)
    at com.rogers.ramraja.XSLT.transform(XSLT.java:66)
    at com.rogers.ramraja.XSLT.main(XSLT.java:41)

1 个答案:

答案 0 :(得分:1)

变换器尝试加载源XML文档时发生错误。它尝试使用UTF-8编码读取它(如果没有XML声明或声明不包含编码属性,UTF-8是默认值。)

显然会抛出错误,因为源文件未在UTF-8中正确编码。

现在,任何以7位ASCII编码的文件也是有效的UTF-8。因此,源文档不能是7位ASCII文件。