我如何修改以下Python脚本以忽略标记中的标记和属性?

时间:2015-06-25 20:25:25

标签: python regex xpath

在构建XPath时,如何修改以下代码以忽略标记(表示标记开头和结尾的<和>字符)和标记内的属性?

下面是一个Python脚本,它将读取格式化的XML文档,然后从当前光标位置确定XPath:

def buildPath(view, selection):
    path = ['']
    lines = []

    region = sublime.Region(0, selection.end())
    for line in view.lines(region):
        contents = view.substr(line)
        lines.append(contents)

    level = -1
    spaces = re.compile('^\s+')
    for line in lines:
        space = spaces.findall(line)
        current = len(space[0]) if len(space) else 0
        node = re.sub(r'\s*<\??([\w.]:)?([\w\-.]+)(\s.)?>.*', r'\2', line)
        if current == level:
            path.pop()
            path.append(node)
        elif current > level:
            path.append(node)
            level = current
        elif current < level:
            path.pop()
            level = current

    return path

1 个答案:

答案 0 :(得分:1)

获取lxmlpip install lxml)的副本:

import lxml.etree
tree = lxml.etree.fromstring(xmlasstring)
tree.xpath('//node')