档案：`xmllint --schema markup.xsd example.xml`

Question

我正在开发一个工具来帮助用户编写与JSP文件性质相似的XHTML-ish文档。这些文档是XML，可以在XHTML命名空间中包含任何格式良好的标记，并且它们之间编织的是我产品命名空间中的元素。除其他外，该工具使用XSD验证输入。

示例输入：

<?xml version="1.0"?>
<markup>
  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
    <c:section>
      <c:paragraph>
        <span>This is a test!</span>
        <a href="http://www.google.com/">click here for more!</a>
      </c:paragraph>
    </c:section>
  </html>
</markup>

我的问题是XSD验证的行为不一致取决于我嵌套元素的深度。我想要的是，https://my_tag_lib.example.com/命名空间中的所有元素都要根据模式进行检查，而命名空间http://www.w3.org/1999/xhtml中的任何元素都可以被宽松地容忍。我想不列出我的XSD中允许的所有HTML元素 - 用户可能想要使用仅在某些浏览器上可用的模糊元素。相反，我只想使用{{1}白名单列出属于命名空间的任何元素。 }。

我发现的是，在某些情况下，属于<xs:any>命名空间但未出现在架构中的元素正在通过验证，而在架构中出现的其他元素可以在失败，给他们无效的属性。

所以： *有效元素根据XSD架构进行验证 *验证器会跳过无效元素吗？

例如，这会通过验证：

my_tag_lib

但后来验证失败了：

<?xml version="1.0"?>
<markup>
  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
    <c:section>
      <div>
        <c:my-invalid-element>This is a test</c:my-invalid-element>
      </div>
    </c:section>
  </html>
</markup>

为什么要针对已识别元素的模式验证属性，而未识别的元素似乎根本没有被消毒？这里的逻辑是什么？我一直在使用<?xml version="1.0"?> <markup> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/"> <c:section> <div> <c:paragraph my-invalid-attr="true">This is a test</c:paragraph> </div> </c:section> </html> </markup>进行验证：

xmllint

以下是我的XSD文件：

档案：`xmllint --schema markup.xsd example.xml`

markup.xsd

档案：`<?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <xs:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="html.xsd" /> <xs:element name="markup"> <xs:complexType mixed="true"> <xs:sequence> <xs:element ref="xhtml:html" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>`

html.xsd

档案：<?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/1999/xhtml"> <xs:import namespace="https://my_tag_lib.example.com/" schemaLocation="my_tag_lib.xsd" /> <xs:element name="html"> <xs:complexType mixed="true"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:any processContents="lax" namespace="http://www.w3.org/1999/xhtml" /> <xs:any processContents="strict" namespace="https://my_tag_lib.example.com/" /> </xs:choice> </xs:complexType> </xs:element> </xs:schema>

my_tag_lib.xsd

Answer 1

您缺少的是对context determined declaration的了解。

首先，看看这个小实验。

<?xml version="1.0"?>
<markup>
    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
        <c:section>
            <div>
                <html>
                    <c:my-invalid-element>This is a test</c:my-invalid-element>
                </html>
            </div>
        </c:section>
    </html>
</markup>

这与您的有效示例相同，但现在我已经更改了c：my-invalid-element正在评估的上下文＆＃34; lax＆＃34;到＆＃34;严格＆＃34;。这是通过插入html元素来完成的，该元素现在强制标记命名空间中的所有元素都是严格的。您可以轻松确认，上述内容无效。

这告诉你（不读documentation）在你的例子中，确定的上下文必须是＆＃34; lax＆＃34;而不是你的期望，这是＆＃34; strict＆＃34;。

为什么上下文不严格？处理div＆＃34;松散＆＃34; （它匹配通配符，但没有定义），因此它的孩子将被评估为松散。与松散意味着匹配：在第一种情况下，找不到c:my-invalid-element的定义，因此给出的指令是don't worry if you can't - 一切都很好。在无效样本中，可以找到c:paragraph的定义，因此it must be ·valid· with respect to that definition - 由于意外属性而不好。

Answer 2

未声明div元素，因此如果不接受架构中的无效类型，并且paragraph元素不允许my-invalid-attr，则没有任何内容可以保留。

也许一些例子可能会更清楚。

如果声明了元素（例如html，section，paragraph）并且其内容来自taglib命名空间（您声明为processContents="strict"），它们将被视为 strict 。这意味着必须声明属性或子元素。这应该无法通过验证：

<html>
    <c:my-invalid-element>This is a test</c:my-invalid-element>
</html>

这样：

<c:section>
    <c:my-invalid-element>This is a test</c:my-invalid-element>
</c:section>

这样：

<div>
    <c:paragraph>
         <c:my-invalid-element>This is a test<c:my-invalid-element>
    </c:paragraph>
</div>

这个（因为属性是内容的一部分）：

<c:paragraph my-invalid-attr="true">This is a test</c:paragraph>

但是如果元素不声明（例如div），它将匹配xs:any声明。没有声明限制div的内容，因此它允许任何内容。所以这个应该通过验证：

<div>
    <c:my-invalid-element>This is a test</c:my-invalid-element>
</div>

由于c:my-invalid-element也未声明，因此它将允许任何内容或属性。这是有效的：

<div>
    <c:my-invalid-element invalid-attribute="hi"> <!-- VALID -->
        <c:invalid></c:invalid>
        <html></html>
    </c:my-invalid-element>
</div>

但是如果将无效元素放在html内，它将失败：

<div>
    <c:my-invalid-element invalid-attribute="hi">
        <html><c:invalid></c:invalid></html>  <!-- NOT VALID -->
    </c:my-invalid-element>
</div>

如果您在声明的元素中使用未声明的属性（不匹配xs:any），无论您的嵌套有多深，都会发生同样的情况：

<div>
    <c:my-invalid-element invalid-attribute="hi"> <!-- VALID -->
        <c:invalid>
            <b> 
                <c:section bad-attribute="boo"></c:section> <!-- FAILS! -->
 ...

使用`<xs：any>`</xs：any>对嵌套元素进行不一致的XSD验证

档案：`xmllint --schema markup.xsd example.xml`

2 个答案:

使用`<xs：any>`</xs：any>对嵌套元素进行不一致的XSD验证

档案：xmllint --schema markup.xsd example.xml

2 个答案:

档案：`xmllint --schema markup.xsd example.xml`