如何使用Apache Daffodil的DataProcessor.unparse()方法来重构原始的已解析消息?

时间:2019-07-19 16:13:21

标签: java parsing dfdl

我是Apache Daffodil的初学者。

我使用Daffodil Java API将输入文本消息成功解析为XML字符串,即

        Compiler dfdlCompiler = Daffodil.compiler();
        dfdlCompiler.setValidateDFDLSchemas(true);
        File schemaFile = this.getFileFromResources("EDIFACT-SupplyChain-D03B/EDIFACT-SupplyChain-Messages-D.03B.xsd");
        ProcessorFactory processorFactory = dfdlCompiler.compileFile(schemaFile);
        DataProcessor dataProcessor = processorFactory.onPath("/");
        java.io.File file = getFileFromResources("TestData/ORDERS_D.03B_Interchange.txt");
        java.io.FileInputStream fis = new java.io.FileInputStream(file);
        InputSourceDataInputStream dis = new InputSourceDataInputStream(fis);
        JDOMInfosetOutputter outputter = new JDOMInfosetOutputter();
        ParseResult parseResult = dataProcessor.parse(dis, outputter);
        Document doc = outputter.getResult().getDocument();
        XMLOutputter xo = new XMLOutputter(org.jdom2.output.Format.getPrettyFormat());
        String xmlString = xo.outputString(doc);

        System.out.println("parsed text... resulting xmlString=" + xmlString);

但是,现在,我不清楚如何使用unparse()方法来重新构造原始文本消息(似乎缺少使用Daffodil的Java API进行未分析来重新构造原始消息的示例)。

尝试一下:

        SAXBuilder builder = new SAXBuilder();
        Document d2 = builder.build(new StringReader(xmlString));
        JDOMInfosetInputter inputter = new JDOMInfosetInputter(d2);
        WritableByteChannel output = Channels.newChannel(new DataOutputStream(new ByteArrayOutputStream()));
        UnparseResult result = dataProcessor.unparse(inputter, output);

如何提取原始消息?还是这种方法不正确?

Apache Daffodil版本:2.3

Java版本:jdk8 +


使用此简化的Java应用程序进行测试...

import java.io.ByteArrayOutputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.StringReader;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.WritableByteChannel;
import org.jdom2.Document;
import org.jdom2.output.XMLOutputter;

import org.apache.daffodil.japi.Compiler;
import org.apache.daffodil.japi.Daffodil;
import org.apache.daffodil.japi.DataProcessor;
import org.apache.daffodil.japi.ParseResult;
import org.apache.daffodil.japi.ProcessorFactory;
import org.apache.daffodil.japi.UnparseResult;
import org.apache.daffodil.japi.infoset.JDOMInfosetInputter;
import org.apache.daffodil.japi.infoset.JDOMInfosetOutputter;
import org.apache.daffodil.japi.io.InputSourceDataInputStream;
import org.jdom2.input.SAXBuilder;

public class Blah2 {

    public static void main(String[] args) throws IOException, Exception {
        Blah2 b = new Blah2();
        b.process();
    }

    private void process() throws IOException, Exception {

        Compiler dfdlCompiler = Daffodil.compiler();
        dfdlCompiler.setValidateDFDLSchemas(true);
        File schemaFile = this.getFileFromResources("EDIFACT-SupplyChain-D03B/EDIFACT-SupplyChain-Messages-D.03B.xsd");
        ProcessorFactory processorFactory = dfdlCompiler.compileFile(schemaFile);
        DataProcessor dataProcessor = processorFactory.onPath("/");
        java.io.File file = getFileFromResources("TestData/ORDERS_D.03B_Interchange.txt");
        java.io.FileInputStream fis = new java.io.FileInputStream(file);
        InputSourceDataInputStream dis = new InputSourceDataInputStream(fis);
        JDOMInfosetOutputter outputter = new JDOMInfosetOutputter();
        ParseResult parseResult = dataProcessor.parse(dis, outputter);
        Document doc = outputter.getResult().getDocument();
        XMLOutputter xo = new XMLOutputter(org.jdom2.output.Format.getPrettyFormat());
        String xmlString = xo.outputString(doc);

        System.out.println("parsed text... resulting xmlString=" + xmlString);

        SAXBuilder builder = new SAXBuilder();
        Document d2 = builder.build(new StringReader(xmlString));
        JDOMInfosetInputter inputter = new JDOMInfosetInputter(d2);
        WritableByteChannel output = Channels.newChannel(new DataOutputStream(new ByteArrayOutputStream()));
        UnparseResult result = dataProcessor.unparse(inputter, output);

        System.out.println("unparsed xml document.. result.toString()=" + String.valueOf(result));        

        //how can I obtain the original input text???
    }

    private File getFileFromResources(String fileName) throws IOException {
        URL resource = this.getClass().getClassLoader().getResource(fileName);
        return new File(resource.getFile());
    }
}   

解析操作的输出如下。

(我仍然不知道如何完成相反的操作-即“未分析”)

parsed text... resulting xmlString=<?xml version="1.0" encoding="UTF-8"?>
<D03B:Interchange xmlns:D03B="http://www.ibm.com/dfdl/edi/un/edifact/SupplyChain/D03B">
  <UNB>
    <S001>
      <E0001>UNOA</E0001>
      <E0002>4</E0002>
    </S001>
    <S002>
      <E0004>APPLICATION</E0004>
      <E0007>1</E0007>
    </S002>
    <S003>
      <E0010>COMPANY</E0010>
      <E0007>1</E0007>
    </S003>
    <S004>
      <E0017>20051107</E0017>
      <E0019>1159</E0019>
    </S004>
    <E0020>6002</E0020>
  </UNB>
  <D03B:Message>
    <UNH>
      <E0062>SSDD1</E0062>
      <S009>
        <E0065>ORDERS</E0065>
        <E0052>D</E0052>
        <E0054>03B</E0054>
        <E0051>UN</E0051>
        <E0057>EAN008</E0057>
      </S009>
    </UNH>
    <D03B:BadMessage>
      <Segment>
        <Name>BGM</Name>
        <Data>2B3232302B424B4F4439392B39</Data>
      </Segment>
      <Segment>
        <Name>DTM</Name>
        <Data>2B3133373A32303035313130373A313032</Data>
      </Segment>
      <Segment>
        <Name>NAD</Name>
        <Data>2B42592B353431323334353030303137363A3A39</Data>
      </Segment>
      <Segment>
        <Name>NAD</Name>
        <Data>2B53552B343031323334353030303039343A3A39</Data>
      </Segment>
      <Segment>
        <Name>CTA</Name>
        <Data>2B4141</Data>
      </Segment>
      <Segment>
        <Name>COM</Name>
        <Data>2B7331313A41412A7332313A41412A7333313A4141</Data>
      </Segment>
      <Segment>
        <Name>LIN</Name>
        <Data>2B312B312B303736343536393130343A4942</Data>
      </Segment>
      <Segment>
        <Name>QTY</Name>
        <Data>2B313A3235</Data>
      </Segment>
      <Segment>
        <Name>FTX</Name>
        <Data>2B41464D2B312B2B4C6F7264206F66207468652052696E6773</Data>
      </Segment>
      <Segment>
        <Name>LIN</Name>
        <Data>2B322B312B303736343536393039303A4942</Data>
      </Segment>
      <Segment>
        <Name>QTY</Name>
        <Data>2B313A3235</Data>
      </Segment>
      <Segment>
        <Name>FTX</Name>
        <Data>2B41464D2B312B2B54686520486F62626974</Data>
      </Segment>
      <Segment>
        <Name>LIN</Name>
        <Data>2B332B312B313836313030343635363A4942</Data>
      </Segment>
      <Segment>
        <Name>QTY</Name>
        <Data>2B313A3136</Data>
      </Segment>
      <Segment>
        <Name>FTX</Name>
        <Data>2B41464D2B312B2B5468652053696C6D6172696C6C696F6E</Data>
      </Segment>
      <Segment>
        <Name>LIN</Name>
        <Data>2B342B312B303539363030363735363A4942</Data>
      </Segment>
      <Segment>
        <Name>QTY</Name>
        <Data>2B313A3130</Data>
      </Segment>
      <Segment>
        <Name>FTX</Name>
        <Data>2B41464D2B312B2B546865204368696C6472656E206F6620487572696E</Data>
      </Segment>
      <Segment>
        <Name>UNS</Name>
        <Data>2B53</Data>
      </Segment>
      <Segment>
        <Name>CNT</Name>
        <Data>2B323A34</Data>
      </Segment>
    </D03B:BadMessage>
    <UNT>
      <E0074>22</E0074>
      <E0062>SSDD1</E0062>
    </UNT>
  </D03B:Message>
  <UNZ>
    <E0036>1</E0036>
    <E0020>6002</E0020>
  </UNZ>
</D03B:Interchange>

unparsed xml document.. result.toString()=org.apache.daffodil.japi.UnparseResult@2e734540

1 个答案:

答案 0 :(得分:2)

实际的UnparseResult不包含未解析的结果(是的,也许我们可以更好地命名它;)。 UnparseResult实际上仅包含未解析成功(通过isError方法)和任何失败时的诊断信息。未解析的数据将作为参数传递到WritableByteChannel的{​​{1}}中。

问题在于您的情况下,您需要定义以下通道:

unparse()

因此,您定义的通道将写入基础的WritableByteChannel output = Channels.newChannel(new DataOutputStream(new ByteArrayOutputStream())); ,但由于没有将其分配给变量,因此您无权访问这些字节。因此,实际上,您要做的就是为变量分配ByteArrayOutputStream并将其传递到新通道,然后在未解析之后访问字节数组-像这样:

ByteArrayOutputStream

此外,在这里使用Daffodil Java API的一些好资源是我们的Java API测试:

https://github.com/apache/incubator-daffodil/blob/master/daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java

其中有使用ByteArrayOutputStream和WritableByteChannel解析字节并转换为字符串的示例。