XML转换失败

时间:2017-04-14 20:21:46

标签: java xml xslt encoding utf-8

我正在使用XML转换器将XML转换为另一种XML。一些没有英文字符转换失败。

原始xml:

<?xml version="1.0" encoding="UTF-8"?>
<RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:glob="http://apply.grants.gov/system/Global-V1.0" xmlns:globLib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_KeyPersonExpanded_2_0:FormVersion="2.0">
   <RR_KeyPersonExpanded_2_0:KeyPerson>
      <RR_KeyPersonExpanded_2_0:Profile>
         <RR_KeyPersonExpanded_2_0:Name>
            <globLib:PrefixName>候.</globLib:PrefixName>
            <globLib:FirstName>Lakshmi</globLib:FirstName>
            <globLib:MiddleName>AB</globLib:MiddleName>
            <globLib:LastName>Sørensen</globLib:LastName>
         </RR_KeyPersonExpanded_2_0:Name>
      </RR_KeyPersonExpanded_2_0:Profile>
   </RR_KeyPersonExpanded_2_0:KeyPerson>
</RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0>

removeemptytags.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" omit-xml-declaration="yes" encoding="UTF-8" method="xml"/>
<xsl:template match="@*|node()">
  <xsl:copy>
   <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*[not(descendant-or-self::*[text()[normalize-space()] | @*])]"/>

</xsl:stylesheet>

java代码:

public String removeEmptyTags(String xml) {
    String filteredXML = "";
    try (OutputStream bos = new ByteArrayOutputStream();) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8")));
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        StreamResult result = new StreamResult(bos);
        transformer.transform(inputXMLSource, result);
        bos.flush();
        filteredXML = bos.toString();
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
    return filteredXML;
}

输出xml:

<RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:glob="http://apply.grants.gov/system/Global-V1.0" xmlns:globLib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_KeyPersonExpanded_2_0:FormVersion="2.0">
<RR_KeyPersonExpanded_2_0:KeyPerson>
<RR_KeyPersonExpanded_2_0:Profile>
<RR_KeyPersonExpanded_2_0:Name>
<globLib:PrefixName>候.</globLib:PrefixName>
<globLib:FirstName>Lakshmi</globLib:FirstName>
<globLib:MiddleName>AB</globLib:MiddleName>
<globLib:LastName>Sørensen</globLib:LastName>
</RR_KeyPersonExpanded_2_0:Name>
</RR_KeyPersonExpanded_2_0:Profile>
</RR_KeyPersonExpanded_2_0:KeyPerson>
</RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0>

正如你所看到的那样,“非英语单词”就变成了一堆无意义的人物。我尝试将xslt中的编码更改为“UTF-16”,但它不起作用。有没有人在这里遇到同样的问题?

1 个答案:

答案 0 :(得分:2)

要获得那么多奇怪的角色,你似乎有多个编码问题。

首先,将XML读入xml String (代码未显示)。虽然您可能忘记指定UTF-8编码,但我们不知道您是如何做错的,因此无法真正帮助解决这个问题。

其次,在致电bos.toString()时。如果您希望结果为String,请不要使用OutputStream。使用StringWriter(请参阅下面的代码)。

第三,将字符串写入文件(代码未显示)时。再说一次,对于这个问题真的无法帮助,因为我们不知道你是怎么做的,尽管你可能忘了指定UTF-8编码。

public String removeEmptyTags(String xml) {
    try (StringWriter out = new StringWriter()) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8")));
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        transformer.transform(inputXMLSource, new StreamResult(out));
        return out.toString();
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
}

实际上,最好直接从/向文件执行,并让XML库找出编码:

public void removeEmptyTags(Path inFile, Path outFile) {
    try (InputStream in = Files.newInputStream(inFile);
         OutputStream out = Files.newOutputStream(outFile)) {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl"));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);

        transformer.transform(new StreamSource(in), new StreamResult(out));
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Exception while removing empty tags : ", e);
        throw new ParsingException(e.getMessage());
    }
}