复制元素

时间:2017-03-07 21:23:35

标签: java xml stax

背景:我使用StAX将XML文档拆分为多个部分,例如分别保存每个段落。为此,我使用XMLEventReader阅读文档并迭代事件。当我遇到要存储的元素时,我将其复制到StringWriter并保留字符串的内容。

然而,当我遇到处理指令时,我遇到了问题。我用以下代码复制了这个问题:

package com.util.xml;

import static org.assertj.core.api.Assertions.assertThat;

import java.io.StringWriter;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.events.XMLEvent;

import org.apache.commons.io.IOUtils;
import org.junit.Test;

import javanet.staxutils.XMLStreamUtils;
import javanet.staxutils.io.StreamEventWriter;

public class XmlUtilTest {

    @Test
    public void xml_with_processing_instruction_is_retained() throws Exception {
        final XMLInputFactory inputFactory = XMLInputFactory.newInstance();
        final String xml = "<p><?processing-instruction user=\"stuart\"?>Title</p>";
        final XMLEventReader eventReader = inputFactory.createXMLEventReader(IOUtils.toInputStream(xml));

        final StringWriter stringWriter = new StringWriter();
        while (eventReader.hasNext()) {
            final XMLEvent event = eventReader.peek();
            if (event.getEventType() == XMLStreamConstants.START_ELEMENT) {
                XMLStreamUtils.copy(eventReader, new StreamEventWriter(stringWriter));
                break;
            } else {
                eventReader.nextEvent();
            }
        }

        final String output = stringWriter.toString();
        assertThat(output).isEqualTo(xml);
    }
}

我希望输出与输入相同(它是一个简单的副本),但是处理指令目标(processing-instruction)和数据(user=\"stuart\")之间的空间是被删除:

org.junit.ComparisonFailure: expected:<...ocessing-instruction[ ]user="stuart"?>Title...> but was:<...ocessing-instruction[]user="stuart"?>Title...>
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at com.util.xml.xmlcontent.ingestion.XmlUtilTest.xml_with_processing_instruction_is_retained(XmlUtilTest.java:38)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

我错过了什么吗?

1 个答案:

答案 0 :(得分:0)

在这种情况下,XML规范对于解析器应该向应用程序报告的确切内容非常模糊。处理指令的语法是

PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'

并且在大多数其他情况下,“S”表示被视为可忽略的空格,因此很多人都认为处理指令中的空格是“可忽略的”,因此不会向应用程序报告。

正如这里指出的那样:How do I format and read XML processing instructions using Java StAX? Stax规范对于这些细节是众所周知的。