SchematronParseError:无效的schematron架构(对于ISOSTS架构)

时间:2017-10-16 10:16:29

标签: python lxml schematron

我尝试使用Schematron验证文档。

我使用schema for ISOSTS standard

from lxml import etree
from lxml.isoschematron import Schematron   


def validate(self, filename: str):
    file = open(filename)

    schema_filename = join('/path/to/ISOSTS_validation.sch')
    schema_file = open(schema_filename)

    # fixme it works. But fails with ISOSTS scheme
    # schema_file = StringIO('''\
    #     <schema xmlns="http://purl.oclc.org/dsdl/schematron" >
    #       <pattern id="sum_equals_100_percent">
    #         <title>Sum equals 100%.</title>
    #         <rule context="Total">
    #           <assert test="sum(//Percent)=100">Sum is not 100%.</assert>
    #         </rule>
    #       </pattern>
    #     </schema>
    # ''')

    sct_doc = etree.parse(schema_file)
    schematron = Schematron(sct_doc)       ## <- FAIL !!!

    doc = etree.parse(file)
    result = schematron.validate(doc)

    file.close()
    schema_file.close()

    return result

validate('/path/to/feature_doc.xml')

错误讯息:

File "/var/www/.../venv/lib/python3.5/site-packages/lxml/isoschematron/__init__.py", line 279, in __init__
    schematron_schema_valid.error_log)
lxml.etree.SchematronParseError: invalid schematron schema: <string>:553:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element function has extra content: param
<string>:560:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element schema, got variable
<string>:0:0:ERROR:RELAXNGV:RELAXNG_ERR_INTEREXTRA: Extra element function in interleave
<string>:42:0:ERROR:RELAXNGV:RELAXNG_ERR_CONTENTVALID: Element schema failed to validate content

如何修复?

2 个答案:

答案 0 :(得分:1)

我不确定它是否非常有用,但我不认为问题出在你的代码中。我认为问题是lxml不支持XSLT-2。

您使用的架构需要符合2010 XSLT-2标准的ISO Schematron [1]。

在Oxygen中打开架构并删除querybinding=xslt2属性会产生大量问题。这包括第553行(<xsl:param name="num-cols" as="xs:integer"/>)的验证错误:此元素不允许使用&#39;属性。这是lxml在[2]上抛出解析错误的行。

lxml没有实现XSTL-2,并明确声明它只支持&#34; pure-XSLT-1.0 skeleton implementation&#34; Schematron的信息,(来自http://lxml.de/validation.html#id2的信息)。

你可能会因为尝试使用lxml而失败。据我所知,还没有一个与XSLT-2兼容的Python XML解析器(如果有人知道,那就太棒了。)

这有点像黑客,但您可以使用子进程使用外部工具(也许是crux + libsaxon)执行验证。这可能是唯一的解决方案。

[1]链接模式的第35行: <schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"

[2] lxml.etree.SchematronParseError: invalid schematron schema: <string>:553:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element function has extra content: param

答案 1 :(得分:0)

使用xsd架构from archivelxml.etree.XMLSchema解决:

def validate(self, filename: str):
    file = open(filename)

    schema_filename = '/path/to/ISOSTS.xsd'
    schema_file = open(schema_filename)

    sct_doc = etree.parse(schema_file)
    xmlschema = etree.XMLSchema(sct_doc)

    doc = etree.parse(file)
    result = xmlschema.validate(doc)

    file.close()
    schema_file.close()

    return result