Set up XML search engine for missing reference within one document?

时间:2015-06-25 19:14:57

标签: xml xml-parsing

Have a very large xml file like this: <root> <item id="1"> <linkToItem>12345</linkToItem> </item> <item id="2"> <linkToItem>234</linkToItem> </item> <!--lots more items --> <item id="12345"/> </root> How do I set up an engine for a simple search to find out if any of these elements, such as <linktoItem>234</linkToItem>, is missing its corresponding <item> id? I would rather avoid setting up a program like oXygen editor with Saxon or other engine.

2 个答案:

答案 0 :(得分:0)

通常的解决方案是为XML编写模式,然后根据模式验证实例。您的模式应包含各种元素的定义,并且根元素的元素声明应定义参照完整性约束:

playground version='3.0'

答案 1 :(得分:0)

使用XML模式进行验证(如Michael的答案中提供的那样)是更好的选择,但如果模式不可用或开发时间太长,可以使用快速的脏脚本,如: / p>

#! /usr/bin/env python

import xml.etree.ElementTree
import argparse
import os
import sys

root = xml.etree.ElementTree.parse(sys.stdin)
parent_map = { c:p for p in root.iter() for c in p }

id_elements = root.findall('./item[@id]')
identifiers = set([ id_element.get('id') for id_element in id_elements ])

ref_elements = root.findall('./item/linkToItem')
for ref_element in (ref_elements):
    ref_id = ref_element.text
    if ref_id not in identifiers:
        print 'reference', ref_id, 'on item', parent_map[ref_element].get('id'), 'cannot be resolved'