以UTF-8序列化DOM文档

时间:2012-07-02 21:09:26

标签: java dom utf-8 utf-16

我一直试图摆脱我的代码中的所有com.sun.org.apache.xml.internal包,用稳定的替代品替换它们。

这是我替换的一种方法......

import com.sun.org.apache.xml.internal.serialize.LineSeparator;
import com.sun.org.apache.xml.internal.serialize.OutputFormat;
import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
...
...

    /**
     * @param source
     * @param target
     * @throws IOException
     */
    protected static void serialize( Document source, Writer target ) throws IOException
    {
        OutputFormat outputFormat = new OutputFormat( (Document) source );
        outputFormat.setLineSeparator( LineSeparator.Windows );
        // format.setIndenting(true);

        outputFormat.setLineWidth( 0 );
        outputFormat.setPreserveSpace( true );

        XMLSerializer serializer = new XMLSerializer( target, outputFormat );
        serializer.asDOMSerializer();
        serializer.serialize( source );
    } // end serialize

这是我发现的另一种选择......

/**
 * @param source
 * @param target
 * @throws IOException
 * @throws IllegalAccessException 
 * @throws InstantiationException 
 * @throws ClassNotFoundException 
 * @throws ClassCastException 
 */
protected static void serialize( Document source, Writer target ) throws Exception
{
    DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
    DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation( "LS" );
    LSSerializer writer = impl.createLSSerializer();
    target.write( writer.writeToString(source) );
} // end serialize

但是,它显示了正在生成的xml的差异。

创建

<?xml version="1.0" encoding="UTF-16"?>

如何修改它以创建UTF-8?

1 个答案:

答案 0 :(得分:0)

我认为你必须使用LSOutput

LSOutput domOutput = impl.createLSOutput();
domOutput.setEncoding(StandardCharsets.UTF_8.name());
domOutput.setCharacterStream(stringWriter);

有关详细信息,请参阅此处的回复:

Change the com.sun.org.apache.xml.internal.serialize.XMLSerializer & com.sun.org.apache.xml.internal.serialize.OutputFormat