我正在使用lxml来解析和客观化路径中的xml文件,我有很多模型和xsd,每个对象模型都映射到某些定义的类,例如,如果xml以model标签开头,那么它就是一个dataModel和如果以页面标记开头,则为viewModel。
我的问题是如何以有效的方式检测xml文件以哪个标记开头,然后使用适当的xsd文件解析它然后将其客观化
files = glob(os.path.join('resources/xml', '*.xml'))
for f in files:
xmlinput = open(f)
xmlContent = xmlinput.read()
if xsdPath:
xsdFile = open(xsdPath)
# xsdFile should retrieve according to xml content
schema = etree.XMLSchema(file=xsdFile)
xmlinput.seek(0)
myxml = etree.parse(xmlinput)
try:
schema.assertValid(myxml)
except etree.DocumentInvalid as x:
print "In file %s error %s has occurred." % (xmlPath, x.message)
finally:
xsdFile.close()
xmlinput.close()
答案 0 :(得分:1)
我自愿放下文件阅读和治疗,专注于你的问题:
>>> from lxml.etree import fromstring
>>> # We have XMLs with different root tag
>>> tree1 = fromstring("<model><foo/><bar/></model>")
>>> tree2 = fromstring("<page><baz/><blah/></page>")
>>>
>>> # We have different treatments
>>> def modelTreatement(etree):
... return etree.xpath('//bar')
...
>>> def pageTreatment(etree):
... return etree.xpath('//blah')
...
>>> # Here is a recipe to read the root tag
>>> tree1.getroottree().getroot().tag
'model'
>>> tree2.getroottree().getroot().tag
'page'
>>>
>>> # So, by building an appropriated dict :
>>> tag_to_treatment_map = {'model': modelTreatement, 'page': pageTreatment}
>>> # You can run the right method on the right tree
>>> for tree in [tree1, tree2]:
... tag_to_treatment_map[tree.getroottree().getroot().tag](tree)
...
[<Element bar at 0x24979b0>]
[<Element blah at 0x2497a00>]
希望这对某人有用,即使我之前没有见过这个。