UTF-8到UTF16解析

时间:2013-02-15 01:50:13

标签: java xml utf-8 xml-parsing utf-16

我有一个UTF-8的XML,并且有一些中文特殊字符,我需要解析这个xml。

DocumentBuilderFactory factory = DocumentBuilderFactory
                    .newInstance();
factory.setIgnoringElementContentWhitespace(true);
factory.setNamespaceAware(true);
factory.setValidating(true);

//byte[] buffer = xmlMsg.getBytes("UTF-16");

logger.info("transformToUTP " + xmlMsg);


//byte[] buffer = soapMessage.getBytes();
//ByteArrayInputStream stream = new ByteArrayInputStream(buffer);               


InputSource is = new InputSource(new ByteArrayInputStream(
                   xmlMsg.getBytes("UTF-16")));

Document doc = factory.newDocumentBuilder().parse(is);
//Document doc = factory.newDocumentBuilder().parse(
                   new InputSource(new StringReader(xmlMsg)));                              

XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(getNameSpace());

XPathExpression soapBodyExpr = xpath.compile(BODY_XPATH_EXP);
Node soapBody = (Node) soapBodyExpr.evaluate(doc,
            XPathConstants.NODE);

Node reqMsgNode = soapBody.getFirstChild();

我在reqMsgNode上得到一个空指针异常。

1 个答案:

答案 0 :(得分:1)

不要将xml转换为字符串,按原样解析,使用

DocummentBuilder.parse(File)DocumentBuilder.parse(InputStream)

解析器将从xml声明中获取编码,例如<?xml version="1.0" encoding="UTF-8"?>,如果缺少,则默认使用UTF-8