我尝试使用Schematron验证文档。
我使用schema for ISOSTS standard。
from lxml import etree
from lxml.isoschematron import Schematron
def validate(self, filename: str):
file = open(filename)
schema_filename = join('/path/to/ISOSTS_validation.sch')
schema_file = open(schema_filename)
# fixme it works. But fails with ISOSTS scheme
# schema_file = StringIO('''\
# <schema xmlns="http://purl.oclc.org/dsdl/schematron" >
# <pattern id="sum_equals_100_percent">
# <title>Sum equals 100%.</title>
# <rule context="Total">
# <assert test="sum(//Percent)=100">Sum is not 100%.</assert>
# </rule>
# </pattern>
# </schema>
# ''')
sct_doc = etree.parse(schema_file)
schematron = Schematron(sct_doc) ## <- FAIL !!!
doc = etree.parse(file)
result = schematron.validate(doc)
file.close()
schema_file.close()
return result
validate('/path/to/feature_doc.xml')
错误讯息:
File "/var/www/.../venv/lib/python3.5/site-packages/lxml/isoschematron/__init__.py", line 279, in __init__
schematron_schema_valid.error_log)
lxml.etree.SchematronParseError: invalid schematron schema: <string>:553:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element function has extra content: param
<string>:560:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element schema, got variable
<string>:0:0:ERROR:RELAXNGV:RELAXNG_ERR_INTEREXTRA: Extra element function in interleave
<string>:42:0:ERROR:RELAXNGV:RELAXNG_ERR_CONTENTVALID: Element schema failed to validate content
如何修复?
答案 0 :(得分:1)
我不确定它是否非常有用,但我不认为问题出在你的代码中。我认为问题是lxml不支持XSLT-2。
您使用的架构需要符合2010 XSLT-2标准的ISO Schematron [1]。
在Oxygen中打开架构并删除querybinding=xslt2
属性会产生大量问题。这包括第553行(<xsl:param name="num-cols" as="xs:integer"/>
)的验证错误:此元素不允许使用&#39;属性。这是lxml在[2]上抛出解析错误的行。
lxml没有实现XSTL-2,并明确声明它只支持&#34; pure-XSLT-1.0 skeleton implementation&#34; Schematron的信息,(来自http://lxml.de/validation.html#id2的信息)。
你可能会因为尝试使用lxml而失败。据我所知,还没有一个与XSLT-2兼容的Python XML解析器(如果有人知道,那就太棒了。)
这有点像黑客,但您可以使用子进程使用外部工具(也许是crux + libsaxon)执行验证。这可能是唯一的解决方案。
[1]链接模式的第35行:
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
[2] lxml.etree.SchematronParseError: invalid schematron schema: <string>:553:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element function has extra content: param
答案 1 :(得分:0)
使用xsd架构from archive和lxml.etree.XMLSchema
解决:
def validate(self, filename: str):
file = open(filename)
schema_filename = '/path/to/ISOSTS.xsd'
schema_file = open(schema_filename)
sct_doc = etree.parse(schema_file)
xmlschema = etree.XMLSchema(sct_doc)
doc = etree.parse(file)
result = xmlschema.validate(doc)
file.close()
schema_file.close()
return result