使用JDOM读写内部DTD

时间:2016-10-28 22:05:39

标签: java xml sax dtd jdom

这是问题Is there some equivalent in Java to Ruby's Nokogiri::XML::EntityDecl?

的后续内容

我有一个简单的DAISY DTBook XML文件(虽然特定的DTD对我的问题并不重要,但这是旧版书籍中使用的实际标准。)它包含来自DTBook和MathML名称空间的XML。 / p>

请注意,DTD声明遵循我从specification for MathML in DAISY复制的约定,它使用组合的DTD,同时引用DTBook标准的外部DTD并为MathML标准添加一些内部ENTITY定义。

percent <- merge(aggregate(calls1["CallsHandled"],calls1["MON1_12"], sum), 
                 aggregate(calls1["CallsHandled"], calls1[c("MON1_12","QUEUE")], sum),
                 by = "MON1_12")
percent[["PercCallsMo"]] <- percent[["CallsHandled.y"]] / percent[["CallsHandled.x"]]
merge(calls1, percent[c("MON1_12", "QUEUE", "PercCallsMo")])

我使用以下Java代码读取文档并将其打印出来。我第一次使用JDOM 1.1.3(因为这个大项目的约束),但我也尝试使用JDOM 2.0.6。

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dtbook PUBLIC "-//NISO//DTD dtbook 2005-2//EN"
 "http://www.daisy.org/z3986/2005/dtbook-2005-2.dtd"
 [
  <!ENTITY % MATHML.prefixed "INCLUDE" >
  <!ENTITY % MATHML.prefix "m">
  <!ENTITY % MATHML.Common.attrib
          "xlink:href    CDATA       #IMPLIED
          xlink:type     CDATA       #IMPLIED
          class          CDATA       #IMPLIED
          style          CDATA       #IMPLIED
          id             ID          #IMPLIED
          xref           IDREF       #IMPLIED
          other          CDATA       #IMPLIED
          xmlns:dtbook   CDATA       #FIXED 'http://www.daisy.org/z3986/2005/dtbook/'
          dtbook:smilref CDATA       #IMPLIED"
  >
  <!ENTITY % mathML2 PUBLIC "-//W3C//DTD MathML 2.0//EN"
             "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd"
  >
  %mathML2;
  <!ENTITY % externalFlow "| m:math">
  <!ENTITY % externalNamespaces "xmlns:m CDATA #FIXED
    'http://www.w3.org/1998/Math/MathML'">
 ]
>
<dtbook xmlns="http://www.daisy.org/z3986/2005/dtbook/" xmlns:m="http://www.w3.org/1998/Math/MathML"
    version="2005-2" xml:lang="eng">
    <head></head>
    <book>
        <frontmatter><doctitle></doctitle></frontmatter>
        <bodymatter>
            <level1>
            <p>Test</p>
                <m:math xmlns:dtbook="http://www.daisy.org/z3986/2005/dtbook/"
                    id="math0001" dtbook:smilref="nativemathml.smil#math0001" altimg="nativemathml0001.png"
                    alttext="sigma-summation UnderScript i equals zero OverScript infinity EndScripts x Subscript i">
                    <m:mrow>
                        <m:mstyle displaystyle='true'>
                            <m:munderover>
                                <m:mo>&#x2211;</m:mo>
                                <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>0</m:mn>
                                </m:mrow>
                                <m:mi>&#x221E;</m:mi>
                            </m:munderover>
                            <m:mrow>
                                <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mi>i</m:mi>
                                </m:msub>
                            </m:mrow>
                        </m:mstyle>
                    </m:mrow>
                </m:math>
            </level1>
        </bodymatter>
        <rearmatter><level1><p></p></level1></rearmatter>
    </book>
</dtbook>

当我使用@Test public void buildDTD2() throws IOException, JDOMException { final PathMatchingResourcePatternResolver pmrpr = new PathMatchingResourcePatternResolver(); final File file = pmrpr.getResource("daisy/mathmldtdtemplate.xml").getFile(); final String uri = file.toURI().toString(); final InputStream stream = new BufferedInputStream(new FileInputStream(file)); final SAXBuilder saxBuilder = new SAXBuilder(); saxBuilder.setValidation(true); saxBuilder.setFeature("http://apache.org/xml/features/validation/schema", true); final InputSource source = new InputSource(new BufferedInputStream(stream)); source.setSystemId(uri); final Document doc = saxBuilder.build(source); String xml2 = new XMLOutputter().outputString(doc); System.out.println(xml2); System.out.println("Internal Subset: " + doc.getDocType().getInternalSubset()); } 在最后一行打印System.out.println时,不打印任何内容。当我打印出整个文档时,我得到了这个:

getInternalSubset()

ENTITY定义消失了!我是否错过了一些允许我维护的选项?我该如何维护它们?当我们处理这些文件时,我们可能需要读取它们并将它们写出来几次而不会丢失这个DTD。

1 个答案:

答案 0 :(得分:0)

经过进一步研究,我找到了a solution on the jdom-interest list

添加语句saxBuilder.setExpandEntities(false);,根据Laurent Bihanic,将强制注册DeclHandler

@Test
public void buildDTD2()
        throws IOException, JDOMException
{
    final PathMatchingResourcePatternResolver pmrpr = new PathMatchingResourcePatternResolver();
    final File file = pmrpr.getResource("daisy/mathmldtdtemplate.xml").getFile();
    final String uri = file.toURI().toString();
    final InputStream stream = new BufferedInputStream(new FileInputStream(file));
    final SAXBuilder saxBuilder = new SAXBuilder();

    saxBuilder.setValidation(true);
    saxBuilder.setFeature("http://apache.org/xml/features/validation/schema", true);

    saxBuilder.setExpandEntities(false);

    final InputSource source = new InputSource(new BufferedInputStream(stream));
    source.setSystemId(uri);
    final Document doc = saxBuilder.build(source);

    String xml2 = new XMLOutputter().outputString(doc);
    System.out.println(xml2);
    System.out.println("Internal Subset: " + doc.getDocType().getInternalSubset());
}

这有效;现在内部子集被读入并在&#34;内部子集:&#34;。

之后打印出来