如何使用Nokogiri使用本地dtd文件正确验证xml文件?

时间:2016-04-27 13:07:25

标签: ruby xml validation nokogiri

我有一个简单有效的DTD和一个似乎符合DTD的有效XML文件,但Nokogiri正在生成大量验证输出,这意味着XML文件未通过验证。

dtd文件是:

<!ELEMENT protocol (copyright?, description?, interface+)>
  <!ATTLIST protocol name CDATA #REQUIRED>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT interface (description?,(request|event|enum)+)>
  <!ATTLIST interface name CDATA #REQUIRED>
  <!ATTLIST interface version CDATA #REQUIRED>
<!ELEMENT request (description?,arg*)>
  <!ATTLIST request name CDATA #REQUIRED>
  <!ATTLIST request type CDATA #IMPLIED>
  <!ATTLIST request since CDATA #IMPLIED>
<!ELEMENT event (description?,arg*)>
  <!ATTLIST event name CDATA #REQUIRED>
  <!ATTLIST event since CDATA #IMPLIED>
<!ELEMENT enum (description?,entry*)>
  <!ATTLIST enum name CDATA #REQUIRED>
  <!ATTLIST enum since CDATA #IMPLIED>
  <!ATTLIST enum bitfield CDATA #IMPLIED>
<!ELEMENT entry (description?)>
  <!ATTLIST entry name CDATA #REQUIRED>
  <!ATTLIST entry value CDATA #REQUIRED>
  <!ATTLIST entry summary CDATA #IMPLIED>
  <!ATTLIST entry since CDATA #IMPLIED>
<!ELEMENT arg (description?)>
  <!ATTLIST arg name CDATA #REQUIRED>
  <!ATTLIST arg type CDATA #REQUIRED>
  <!ATTLIST arg summary CDATA #IMPLIED>
  <!ATTLIST arg interface CDATA #IMPLIED>
  <!ATTLIST arg allow-null CDATA #IMPLIED>
  <!ATTLIST arg enum CDATA #IMPLIED>
<!ELEMENT description (#PCDATA)>
  <!ATTLIST description summary CDATA #REQUIRED>

xml文件是:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE protocol SYSTEM "wayland.dtd">
<protocol name="wayland">

  <copyright>
    FOO
    SOFTWARE.
  </copyright>

  <interface name="wl_display" version="1">
    <description summary="core global object">
      The core global object.  This is a special singleton object.  It
      is used for internal Wayland protocol features.
    </description>

    <request name="sync">
      <description summary="asynchronous roundtrip">
    The sync request asks the server to emit the 'done' event
    on the returned wl_callback object.  Since requests are
    handled in-order and events are delivered in-order, this can
    be used as a barrier to ensure all previous requests and the
    resulting events have been handled.

    The object returned by this request will be destroyed by the
    compositor after the callback is fired and as such the client must not
    attempt to use it after that point.

    The callback_data passed in the callback is the event serial.
      </description>
      <arg name="callback" type="new_id" interface="wl_callback"/>
    </request>
  </interface>

</protocol>

我的简单Ruby程序是:

require 'nokogiri'

DTD_PATH = "wayland.dtd"
XML_PATH = "wayland.xml"

dtd_doc = Nokogiri::XML::Document.parse(open(DTD_PATH))
dtd = Nokogiri::XML::DTD.new('protocol', dtd_doc)
doc = Nokogiri::XML(open(XML_PATH))
puts dtd.validate(doc)

程序打印验证数组的内容,该数组不为空。样本输出:

No declaration for attribute name of element request
No declaration for element description
No declaration for attribute summary of element description

即使在向xml文件添加DOCTYPE声明后,也可以使用la:

<!DOCTYPE protocol SYSTEM "wayland.dtd">

用DTD包装DTD:

<!DOCTYPE protocol [
...
]>

我仍然观察到同样失败的验证输出。我做错了什么?

1 个答案:

答案 0 :(得分:2)

您可以通过指定ParseOptions来进行验证。您需要使用doctype声明<!DOCTYPE protocol SYSTEM "wayland.dtd">

指定doctype
require 'nokogiri'

DTD_PATH = "wayland.dtd"
XML_PATH = "wayland.xml"

xml = File.read(XML_PATH)
options = Nokogiri::XML::ParseOptions::DTDVALID
doc = Nokogiri::XML::Document.parse(xml, nil, nil, options)
puts doc.external_subset.validate(doc)