我正在使用XML转换器将XML转换为另一种XML。一些没有英文字符转换失败。
原始xml:
<?xml version="1.0" encoding="UTF-8"?>
<RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:glob="http://apply.grants.gov/system/Global-V1.0" xmlns:globLib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_KeyPersonExpanded_2_0:FormVersion="2.0">
<RR_KeyPersonExpanded_2_0:KeyPerson>
<RR_KeyPersonExpanded_2_0:Profile>
<RR_KeyPersonExpanded_2_0:Name>
<globLib:PrefixName>候.</globLib:PrefixName>
<globLib:FirstName>Lakshmi</globLib:FirstName>
<globLib:MiddleName>AB</globLib:MiddleName>
<globLib:LastName>Sørensen</globLib:LastName>
</RR_KeyPersonExpanded_2_0:Name>
</RR_KeyPersonExpanded_2_0:Profile>
</RR_KeyPersonExpanded_2_0:KeyPerson>
</RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0>
removeemptytags.xsl:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" omit-xml-declaration="yes" encoding="UTF-8" method="xml"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(descendant-or-self::*[text()[normalize-space()] | @*])]"/>
</xsl:stylesheet>
java代码:
public String removeEmptyTags(String xml) {
String filteredXML = "";
try (OutputStream bos = new ByteArrayOutputStream();) {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8")));
StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
Transformer transformer = transformerFactory.newTransformer(xsltSource);
StreamResult result = new StreamResult(bos);
transformer.transform(inputXMLSource, result);
bos.flush();
filteredXML = bos.toString();
} catch (Exception e) {
logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
throw new ParsingException(e.getMessage());
}
return filteredXML;
}
输出xml:
<RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:glob="http://apply.grants.gov/system/Global-V1.0" xmlns:globLib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_KeyPersonExpanded_2_0:FormVersion="2.0">
<RR_KeyPersonExpanded_2_0:KeyPerson>
<RR_KeyPersonExpanded_2_0:Profile>
<RR_KeyPersonExpanded_2_0:Name>
<globLib:PrefixName>候.</globLib:PrefixName>
<globLib:FirstName>Lakshmi</globLib:FirstName>
<globLib:MiddleName>AB</globLib:MiddleName>
<globLib:LastName>Sørensen</globLib:LastName>
</RR_KeyPersonExpanded_2_0:Name>
</RR_KeyPersonExpanded_2_0:Profile>
</RR_KeyPersonExpanded_2_0:KeyPerson>
</RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0>
正如你所看到的那样,“非英语单词”就变成了一堆无意义的人物。我尝试将xslt中的编码更改为“UTF-16”,但它不起作用。有没有人在这里遇到同样的问题?
答案 0 :(得分:2)
要获得那么多奇怪的角色,你似乎有多个编码问题。
首先,将XML读入xml
String (代码未显示)。虽然您可能忘记指定UTF-8
编码,但我们不知道您是如何做错的,因此无法真正帮助解决这个问题。
其次,在致电bos.toString()
时。如果您希望结果为String
,请不要使用OutputStream
。使用StringWriter
(请参阅下面的代码)。
第三,将字符串写入文件(代码未显示)时。再说一次,对于这个问题真的无法帮助,因为我们不知道你是怎么做的,尽管你可能忘了指定UTF-8
编码。
public String removeEmptyTags(String xml) {
try (StringWriter out = new StringWriter()) {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8")));
StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
Transformer transformer = transformerFactory.newTransformer(xsltSource);
transformer.transform(inputXMLSource, new StreamResult(out));
return out.toString();
} catch (Exception e) {
logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
throw new ParsingException(e.getMessage());
}
}
实际上,最好直接从/向文件执行,并让XML库找出编码:
public void removeEmptyTags(Path inFile, Path outFile) {
try (InputStream in = Files.newInputStream(inFile);
OutputStream out = Files.newOutputStream(outFile)) {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
Transformer transformer = transformerFactory.newTransformer(xsltSource);
transformer.transform(new StreamSource(in), new StreamResult(out));
} catch (Exception e) {
logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
throw new ParsingException(e.getMessage());
}
}