我在Web服务响应中有以下元素。正如您所看到的,它被转义为CDATA的转义XML,因此XML解析器只是将其视为一个字符串,我无法通过XSLT和XPath的常用方法从中获取所需的数据。我需要将这个丑陋的字符串转换回XML,以便我能够正确阅读它。
我尝试过进行搜索替换,只是简单地将所有<
转换为<
和>
转换为>
这样做效果很好,但存在问题: message.body
元素实际上可以包含不是有效XML的HTML。对于我所知道的,我甚至可能都不是有效的HTML。因此,如果我只是替换所有内容,当我尝试将字符串转换回XML文档时,这可能会崩溃。
我怎样才能安全地解决这个问题?有没有一种好方法可以在message.body
打开和关闭标记之间的除之外进行替换?
<output><item type="object">
<ticket.id type="string">171</ticket.id>
<ticket.title type="string">SoapUI Test</ticket.title>
<ticket.created_at type="string">2013-12-03 12:50:54</ticket.created_at>
<ticket.status type="string">Open</ticket.status>
<updated type="string">false</updated>
<message type="object">
<message.id type="string">520</message.id>
<message.created_at type="string">2013-12-03 12:50:54.000</message.created_at>
<message.author type="string"/>
<message.body type="string">Just a test message...</message.body>
</message>
<message type="object">
<message.id type="string">521</message.id>
<message.created_at type="string">2013-12-03 13:58:32.000</message.created_at>
<message.author type="string"/>
<message.body type="string">Another message!</message.body>
</message>
</item>
</output>
答案 0 :(得分:0)
这实际上是从我正在进行的项目中解脱出来的。
private Node stringToNode(String textContent) {
Element node = null;
try {
node = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new ByteArrayInputStream(textContent.getBytes()))
.getDocumentElement();
} catch (SAXException e) {
logger.error(e.getMessage(), e);
} catch (IOException e) {
logger.error(e.getMessage(), e);
} catch (ParserConfigurationException e) {
logger.error(e.getMessage(), e);
}
return node;
}
这将为您提供表示字符串的文档对象。我使用它将其恢复到原始文档中:
if (textContent.contains(XML_HEADER)) {
textContent = textContent.substring(textContent.indexOf(XML_HEADER) + XML_HEADER.length());
}
Node newNode = stringToNode(textContent);
if (newNode != null) {
Node importedNode = soapBody.getOwnerDocument().importNode(newNode, true);
nextChild.setTextContent(null);
nextChild.appendChild(importedNode);
}
答案 1 :(得分:0)
这是我目前的解决方案。您为乱搞的节点和一组可能包含混乱的HTML和其他问题的元素名称提供XPath。大致如下工作
步骤2中的正则表达式解决方案可能不是万无一失的,但目前还没有真正看到更好的解决方案。如果你这样做,请告诉我!
<强> CDataFixer 强>
import java.util.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class CDataFixer
{
private final XmlHelper xml = XmlHelper.getInstance();
public Document fix(Document document, String nodesToFix, Set<String> excludes) throws XPathExpressionException, XmlException
{
return fix(document, xml.newXPath().compile(nodesToFix), excludes);
}
private Document fix(Document document, XPathExpression nodesToFix, Set<String> excludes) throws XPathExpressionException, XmlException
{
Document wc = xml.copy(document);
NodeList nodes = (NodeList) nodesToFix.evaluate(wc, XPathConstants.NODESET);
int nodeCount = nodes.getLength();
for(int n=0; n<nodeCount; n++)
parse(nodes.item(n), excludes);
return wc;
}
private void parse(Node node, Set<String> excludes) throws XmlException
{
String text = node.getTextContent();
for(String exclude : excludes)
{
String regex = String.format("(?s)(<%1$s\\b[^>]*>)(.*?)(</%1$s>)", Pattern.quote(exclude));
text = text.replaceAll(regex, "$1<![CDATA[$2]]>$3");
}
String randomNode = "tmp_"+UUID.randomUUID().toString();
text = String.format("<%1$s>%2$s</%1$s>", randomNode, text);
NodeList parsed = xml
.parse(text)
.getFirstChild()
.getChildNodes();
node.setTextContent(null);
for(int n=0; n<parsed.getLength(); n++)
node.appendChild(node.getOwnerDocument().importNode(parsed.item(n), true));
}
}
<强> XmlHelper 强>
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.sax.*;
import javax.xml.transform.stream.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public final class XmlHelper
{
private static final XmlHelper instance = new XmlHelper();
public static XmlHelper getInstance()
{
return instance;
}
private final SAXTransformerFactory transformerFactory;
private final DocumentBuilderFactory documentBuilderFactory;
private final XPathFactory xpathFactory;
private XmlHelper()
{
documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
xpathFactory = XPathFactory.newInstance();
TransformerFactory tf = TransformerFactory.newInstance();
if (!tf.getFeature(SAXTransformerFactory.FEATURE))
throw new RuntimeException("Failed to create SAX-compatible TransformerFactory.");
transformerFactory = (SAXTransformerFactory) tf;
}
public DocumentBuilder newDocumentBuilder()
{
try
{
return documentBuilderFactory.newDocumentBuilder();
}
catch (ParserConfigurationException e)
{
throw new RuntimeException("Failed to create new "+DocumentBuilder.class, e);
}
}
public XPath newXPath()
{
return xpathFactory.newXPath();
}
public Transformer newIdentityTransformer(boolean omitXmlDeclaration, boolean indent)
{
try
{
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, indent ? "yes" : "no");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, omitXmlDeclaration ? "yes" : "no");
return transformer;
}
catch (TransformerConfigurationException e)
{
throw new RuntimeException("Failed to create Transformer instance: "+e.getMessage(), e);
}
}
public Templates newTemplates(String xslt) throws XmlException
{
try
{
return transformerFactory.newTemplates(new DOMSource(parse(xslt)));
}
catch (TransformerConfigurationException e)
{
throw new RuntimeException("Failed to create templates: "+e.getMessage(), e);
}
}
public Document parse(String xml) throws XmlException
{
return parse(new InputSource(new StringReader(xml)));
}
public Document parse(InputSource xml) throws XmlException
{
try
{
return newDocumentBuilder().parse(xml);
}
catch (SAXException e)
{
throw new XmlException("Failed to parse xml: "+e.getMessage(), e);
}
catch (IOException e)
{
throw new XmlException("Failed to read xml: "+e.getMessage(), e);
}
}
public String toString(Node node)
{
return toString(node, true, false);
}
public String toString(Node node, boolean omitXMLDeclaration, boolean indent)
{
try
{
StringWriter writer = new StringWriter();
newIdentityTransformer(omitXMLDeclaration, indent)
.transform(new DOMSource(node), new StreamResult(writer));
return writer.toString();
}
catch (TransformerException e)
{
throw new RuntimeException("Failed to transform XML into string: " + e.getMessage(), e);
}
}
public Document copy(Document document)
{
DOMSource source = new DOMSource(document);
DOMResult result = new DOMResult();
try
{
newIdentityTransformer(true, false)
.transform(source, result);
return (Document) result.getNode();
}
catch (TransformerException e)
{
throw new RuntimeException("Failed to copy XML: " + e.getMessage(), e);
}
}
}