从java中的套接字读取多个xml文档

时间:2009-05-28 13:42:53

标签: java xml

我正在编写一个客户端,需要通过套接字读取多个连续的小型XML文档。我可以假设编码总是UTF-8,并且可以选择在文档之间分隔空格。文档最终应该进入DOM对象。完成此任务的最佳方法是什么?

问题的本质是解析器期望流中的单个文档并考虑其余的内容垃圾。我认为我可以通过跟踪元素深度人工结束文档,并使用现有输入流创建新的阅读器。例如。类似的东西:

// Broken 
public void parseInputStream(InputStream inputStream) throws Exception
{
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLOutputFactory xof = XMLOutputFactory.newInstance();
    XMLEventFactory eventFactory = XMLEventFactory.newInstance();        
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document doc = documentBuilder.newDocument();
    XMLEventWriter domWriter = xof.createXMLEventWriter(new DOMResult(doc));
    XMLStreamReader xmlStreamReader = factory.createXMLStreamReader(inputStream);
    XMLEventReader reader = factory.createXMLEventReader(xmlStreamReader);
    int depth = 0;

    while (reader.hasNext()) {
        XMLEvent evt = reader.nextEvent();
        domWriter.add(evt);

        switch (evt.getEventType()) {
        case XMLEvent.START_ELEMENT:
            depth++;
            break;

        case XMLEvent.END_ELEMENT:
            depth--;

            if (depth == 0) 
            {                       
                domWriter.add(eventFactory.createEndDocument());
                System.out.println(doc);
                reader.close();
                xmlStreamReader.close();

                xmlStreamReader = factory.createXMLStreamReader(inputStream);
                reader = factory.createXMLEventReader(xmlStreamReader);

                doc = documentBuilder.newDocument();
                domWriter = xof.createXMLEventWriter(new DOMResult(doc));    
                domWriter.add(eventFactory.createStartDocument());
            }
            break;                    
        }
    }
}

然而,在< a>< / a>< b>< / b>< c>< / c>等输入上运行此功能打印第一个文档并抛出XMLStreamException。什么是正确的方法呢?

澄清:不幸的是,协议是由服务器修复的,无法更改,因此预先设置长度或包装内容将无法正常工作。

9 个答案:

答案 0 :(得分:3)

  • 每个文档的长度前缀(以字节为单位)。
  • 从套接字中读取第一个文档的长度
  • 从套接字中读取大量数据,将其转储到ByteArrayOutputStream
  • 从结果
  • 创建ByteArrayInputStream
  • 解析ByteArrayInputStream以获取第一个文档
  • 重复第二个文件等

答案 1 :(得分:1)

IIRC,XML文档最后可以有注释和处理指令,所以没有真正的方法可以确切地告诉你何时到达文件的末尾。

已经提到了几种处理这种情况的方法。另一种方法是将非法字符或字节放入流中,例如NUL或零。这样做的好处是您无需更改文档,也无需缓冲整个文件。

答案 2 :(得分:1)

只需更改为任何流

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;

import javax.xml.namespace.QName;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public class LogParser {

    private XMLInputFactory inputFactory = null;
    private XMLStreamReader xmlReader = null;
    InputStream is;
    private int depth;
    private QName rootElement;

    private static class XMLStream extends InputStream
    {
        InputStream delegate;
        StringReader startroot = new StringReader("<root>");
        StringReader endroot = new StringReader("</root>");

        XMLStream(InputStream delegate)
        {
            this.delegate = delegate;
        }

        public int read() throws IOException {
            int c = startroot.read();
            if(c==-1)
            {
                c = delegate.read();
            }
            if(c==-1)
            {
                c = endroot.read();
            }
            return c;
        }

    }

    public LogParser() {
        inputFactory = XMLInputFactory.newInstance();
    }

    public void read() throws Exception {
        is = new XMLStream(new FileInputStream(new File(
            "./myfile.log")));
        xmlReader = inputFactory.createXMLStreamReader(is);

        while (xmlReader.hasNext()) {
            printEvent(xmlReader);
            xmlReader.next();
        }
        xmlReader.close();

    }

    public void printEvent(XMLStreamReader xmlr) throws Exception {
        switch (xmlr.getEventType()) {
        case XMLStreamConstants.END_DOCUMENT:
            System.out.println("finished");
            break;
        case XMLStreamConstants.START_ELEMENT:
            System.out.print("<");
            printName(xmlr);
            printNamespaces(xmlr);
            printAttributes(xmlr);
            System.out.print(">");
            if(rootElement==null && depth==1)
            {
                rootElement = xmlr.getName();
            }
            depth++;
            break;
        case XMLStreamConstants.END_ELEMENT:
            System.out.print("</");
            printName(xmlr);
            System.out.print(">");
            depth--;
            if(depth==1 && rootElement.equals(xmlr.getName()))
            {
                rootElement=null;
                System.out.println("finished element");
            }
            break;
        case XMLStreamConstants.SPACE:
        case XMLStreamConstants.CHARACTERS:
            int start = xmlr.getTextStart();
            int length = xmlr.getTextLength();
            System.out
                    .print(new String(xmlr.getTextCharacters(), start, length));
            break;

        case XMLStreamConstants.PROCESSING_INSTRUCTION:
            System.out.print("<?");
            if (xmlr.hasText())
                System.out.print(xmlr.getText());
            System.out.print("?>");
            break;

        case XMLStreamConstants.CDATA:
            System.out.print("<![CDATA[");
            start = xmlr.getTextStart();
            length = xmlr.getTextLength();
            System.out
                    .print(new String(xmlr.getTextCharacters(), start, length));
            System.out.print("]]>");
            break;

        case XMLStreamConstants.COMMENT:
            System.out.print("<!--");
            if (xmlr.hasText())
                System.out.print(xmlr.getText());
            System.out.print("-->");
            break;

        case XMLStreamConstants.ENTITY_REFERENCE:
            System.out.print(xmlr.getLocalName() + "=");
            if (xmlr.hasText())
                System.out.print("[" + xmlr.getText() + "]");
            break;

        case XMLStreamConstants.START_DOCUMENT:
            System.out.print("<?xml");
            System.out.print(" version='" + xmlr.getVersion() + "'");
            System.out.print(" encoding='" + xmlr.getCharacterEncodingScheme()
                    + "'");
            if (xmlr.isStandalone())
                System.out.print(" standalone='yes'");
            else
                System.out.print(" standalone='no'");
            System.out.print("?>");
            break;

        }
    }

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        try {
            new LogParser().read();
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    private static void printName(XMLStreamReader xmlr) {
        if (xmlr.hasName()) {
            System.out.print(getName(xmlr));
        }
    }

    private static String getName(XMLStreamReader xmlr) {
        if (xmlr.hasName()) {
            String prefix = xmlr.getPrefix();
            String uri = xmlr.getNamespaceURI();
            String localName = xmlr.getLocalName();
            return getName(prefix, uri, localName);
        }
        return null;
    }

    private static String getName(String prefix, String uri, String localName) {
        String name = "";
        if (uri != null && !("".equals(uri)))
            name += "['" + uri + "']:";
        if (prefix != null)
            name += prefix + ":";
        if (localName != null)
            name += localName;
        return name;
    }   

    private static void printAttributes(XMLStreamReader xmlr) {
        for (int i = 0; i < xmlr.getAttributeCount(); i++) {
            printAttribute(xmlr, i);
        }
    }

    private static void printAttribute(XMLStreamReader xmlr, int index) {
        String prefix = xmlr.getAttributePrefix(index);
        String namespace = xmlr.getAttributeNamespace(index);
        String localName = xmlr.getAttributeLocalName(index);
        String value = xmlr.getAttributeValue(index);
        System.out.print(" ");
        System.out.print(getName(prefix, namespace, localName));
        System.out.print("='" + value + "'");
    }

    private static void printNamespaces(XMLStreamReader xmlr) {
        for (int i = 0; i < xmlr.getNamespaceCount(); i++) {
            printNamespace(xmlr, i);
        }
    }

    private static void printNamespace(XMLStreamReader xmlr, int index) {
        String prefix = xmlr.getNamespacePrefix(index);
        String uri = xmlr.getNamespaceURI(index);
        System.out.print(" ");
        if (prefix == null)
            System.out.print("xmlns='" + uri + "'");
        else
            System.out.print("xmlns:" + prefix + "='" + uri + "'");
    }

}

答案 3 :(得分:0)

一个简单的解决方案是将文档包装在发送端的新根元素中:

<?xml version="1.0"?>
<documents>
    ... document 1 ...
    ... document 2 ...
</documents>

但是,您必须确保不包含XML标头(<?xml ...?>)。如果所有文档都使用相同的编码,则可以使用一个简单的过滤器来完成,只需忽略每个文档的第一行(如果它以<?xml开头

答案 4 :(得分:0)

找到这个forum message(您可能已经看过),它有一个解决方案,包装输入流并测试两个ascii字符之一(见post)。

您可以尝试对此进行改编,首先转换为使用阅读器(进行正确的字符编码),然后进行元素计数,直到到达结束元素,此时触发EOM。

答案 5 :(得分:0)

您好 我在工作中也遇到了这个问题(因此不会发布代码)。我能想到的最优雅的解决方案,以及非常好用的imo,如下所示

创建一个类,例如DocumentSplittingInputStream,它扩展InputStream并在其构造函数中获取底层输入流(或在构造之后设置...)。 添加一个字节数组closeTag的字段,其中包含您要查找的结束根节点的字节。 添加一个名为matchCount或其他的字段int,初始化为零。 添加一个名为underlyingInputStreamNotFinished的字段布尔值,初始化为true

在read()实现上:

  1. 检查matchCount == closeTag.length是否匹配,将matchCount设置为-1,返回-1
  2. 如果matchCount == -1,设置matchCount = 0,在底层输入流上调用read(),直到得到-1或'&lt;' (流上下一个文档的xml声明)并返回它。请注意,据我所知,xml规范允许在文档元素之后进行注释,但我知道我不会从源代码中获取,因此没有理由处理它 - 如果你不能确定你需要更改“狼吞虎咽地“咯。”。
  3. 否则从底层输入流中读取一个int(如果它等于closeTag [matchCount]然后递增matchCount,如果它不然后将matchCount重置为零)并返回新读取的字节
  4. 添加一个方法,该方法返回基础流是否已关闭的布尔值。 基础输入流上的所有读取都应该通过一个单独的方法来检查读取的值是否为-1,如果是,则将字段“underlyingInputStreamNotFinished”设置为false。

    我可能错过了一些小问题,但我相信你能得到这些照片。

    然后在使用代码中,如果您使用的是xstream:

    DocumentSplittingInputStream dsis = new DocumentSplittingInputStream(underlyingInputStream);
    while (dsis.underlyingInputStreamNotFinished()) {
        MyObject mo = xstream.fromXML(dsis);
        mo.doSomething(); // or something.doSomething(mo);
    }
    
    

    大卫

答案 6 :(得分:0)

我必须做这样的事情,在我研究如何处理它的过程中,我发现这个线程即使它已经很老了,我只是回复(对自己)here将所有内容包装在自己的Reader中为了更简单的使用

答案 7 :(得分:0)

我遇到了类似的问题。我正在使用的Web服务(在某些情况下)将返回多个xml文档以响应单个HTTP GET请求。我可以将整个响应读入String并将其拆分,但我实现了基于user467257上面帖子的拆分输入流。这是代码:

public class AnotherSplittingInputStream extends InputStream {
    private final InputStream realStream;
    private final byte[] closeTag;

    private int matchCount;
    private boolean realStreamFinished;
    private boolean reachedCloseTag;

    public AnotherSplittingInputStream(InputStream realStream, String closeTag) {
        this.realStream = realStream;
        this.closeTag = closeTag.getBytes();
    }

    @Override
    public int read() throws IOException {
        if (reachedCloseTag) {
            return -1;
        }

        if (matchCount == closeTag.length) {
            matchCount = 0;
            reachedCloseTag = true;
            return -1;
        }

        int ch = realStream.read();
        if (ch == -1) {
            realStreamFinished = true;
        }
        else if (ch == closeTag[matchCount]) {
            matchCount++;
        } else {
            matchCount = 0;
        }
        return ch;
    }

    public boolean hasMoreData() {
        if (realStreamFinished == true) {
            return false;
        } else {
            reachedCloseTag = false;
            return true;
        }
    }
}

使用它:

String xml =
        "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
        "<root>first root</root>" +
        "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
        "<root>second root</root>";
ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes());
SplittingInputStream splitter = new SplittingInputStream(is, "</root>");
BufferedReader reader = new BufferedReader(new InputStreamReader(splitter));

while (splitter.hasMoreData()) {
    System.out.println("Starting next stream");
    String line = null;
    while ((line = reader.readLine()) != null) {
        System.out.println("line ["+line+"]");
    }
}

答案 8 :(得分:0)

我使用JAXB方法来解组来自乘法流的消息:

<强> MultiInputStream.java

public class MultiInputStream extends InputStream {
    private final Reader source;
    private final StringReader startRoot = new StringReader("<root>");
    private final StringReader endRoot = new StringReader("</root>");

    public MultiInputStream(Reader source) {
        this.source = source;
    }

    @Override
    public int read() throws IOException {
        int count = startRoot.read();
        if (count == -1) {
            count = source.read();
        }
        if (count == -1) {
            count = endRoot.read();
        }
        return count;
    }
}

<强> MultiEventReader.java

public class MultiEventReader implements XMLEventReader {

    private final XMLEventReader reader;
    private boolean isXMLEvent = false;
    private int level = 0;

    public MultiEventReader(XMLEventReader reader) throws XMLStreamException {
        this.reader = reader;
        startXML();
    }

    private void startXML() throws XMLStreamException {
        while (reader.hasNext()) {
            XMLEvent event = reader.nextEvent();
            if (event.isStartElement()) {
                return;
            }
        }
    }

    public boolean hasNextXML() {
        return reader.hasNext();
    }

    public void nextXML() throws XMLStreamException {
        while (reader.hasNext()) {
            XMLEvent event = reader.peek();
            if (event.isStartElement()) {
                isXMLEvent = true;
                return;
            }
            reader.nextEvent();
        }
    }

    @Override
    public XMLEvent nextEvent() throws XMLStreamException {
        XMLEvent event = reader.nextEvent();
        if (event.isStartElement()) {
            level++;
        }
        if (event.isEndElement()) {
            level--;
            if (level == 0) {
                isXMLEvent = false;
            }
        }
        return event;
    }

    @Override
    public boolean hasNext() {
        return isXMLEvent;
    }

    @Override
    public XMLEvent peek() throws XMLStreamException {
        XMLEvent event = reader.peek();
        if (level == 0) {
            while (event != null && !event.isStartElement() && reader.hasNext()) {
                reader.nextEvent();
                event = reader.peek();
            }
        }
        return event;
    }

    @Override
    public String getElementText() throws XMLStreamException {
        throw new NotImplementedException();
    }

    @Override
    public XMLEvent nextTag() throws XMLStreamException {
        throw new NotImplementedException();
    }

    @Override
    public Object getProperty(String name) throws IllegalArgumentException {
        throw new NotImplementedException();
    }

    @Override
    public void close() throws XMLStreamException {
        throw new NotImplementedException();
    }

    @Override
    public Object next() {
        throw new NotImplementedException();
    }

    @Override
    public void remove() {
        throw new NotImplementedException();
    }
}

<强> Message.java

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "Message")
public class Message {

    public Message() {
    }

    @XmlAttribute(name = "ID", required = true)
    protected long id;

    public long getId() {
        return id;
    }

    public void setId(long id) {
        this.id = id;
    }

    @Override
    public String toString() {
        return "Message{id=" + id + '}';
    }
}

阅读多重邮件:

public static void main(String[] args) throws Exception{

    StringReader stringReader = new StringReader(
            "<Message ID=\"123\" />\n" +
            "<Message ID=\"321\" />"
    );

    JAXBContext context = JAXBContext.newInstance(Message.class);
    Unmarshaller unmarshaller = context.createUnmarshaller();

    XMLInputFactory inputFactory = XMLInputFactory.newFactory();
    MultiInputStream multiInputStream = new MultiInputStream(stringReader);
    XMLEventReader xmlEventReader = inputFactory.createXMLEventReader(multiInputStream);
    MultiEventReader multiEventReader = new MultiEventReader(xmlEventReader);

    while (multiEventReader.hasNextXML()) {
        Object message = unmarshaller.unmarshal(multiEventReader);
        System.out.println(message);
        multiEventReader.nextXML();
    }
}

<强>结果:

Message{id=123}
Message{id=321}