我正在尝试转换一些us-ascii编码的xml文件
变压器使用utf-8 / iso-8859-1可以正常工作,但不适用于us-ascii。
我还尝试使用FileInputStream
方法但不确定您是否在StreamResult
部分指定了任何编码
这是我的代码:
File xsl = new File("src/xsl/prism.xsl");
String fname = file.getName();
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(xsl);
try{
Transformer transformer = factory.newTransformer(xslt);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.setOutputProperty(OutputKeys.ENCODING, "us-ascii");
Source text = new StreamSource(file.getCanonicalFile());
System.out.println("Transformed " + fname + "\n");
transformer.transform(text, new StreamResult(new File(outPath + file.getName())));
}catch (TransformerException | IOException e) {
System.out.println("Error in: "+fname+"\n");
e.printStackTrace();
}
以下是抛出的异常:
com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Invalid byte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:464)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:252)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:565)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:748)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:359)
at com.rogers.ramraja.XSLT.transform(XSLT.java:66)
at com.rogers.ramraja.XSLT.main(XSLT.java:41)
---------
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:687)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:408)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1728)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1400)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2823)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:458)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:252)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:565)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:748)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:359)
at com.rogers.ramraja.XSLT.transform(XSLT.java:66)
at com.rogers.ramraja.XSLT.main(XSLT.java:41)
答案 0 :(得分:1)
变换器尝试加载源XML文档时发生错误。它尝试使用UTF-8编码读取它(如果没有XML声明或声明不包含编码属性,UTF-8是默认值。)
显然会抛出错误,因为源文件未在UTF-8中正确编码。
现在,任何以7位ASCII编码的文件也是有效的UTF-8。因此,源文档不能是7位ASCII文件。