Question

下面的代码使用lxml（python 3.3）从Excel 2003 XML工作簿中读取一个表。代码工作正常，但是为了通过get（）方法访问Data元素的Type属性，我需要使用键'{urn：schemas-microsoft-com：office：spreadsheet} Type' - 为什么这样，我已使用ss前缀指定了此命名空间。

我能想到的是这个命名空间在文档中出现两次，一次是名称空间前缀，一次是没有。

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">

在文件中，元素和属性声明如下 - 带有ss：前缀的Type属性和没有前缀的Cell和Data元素。然而声明说两者都属于相同的模式'urn：schemas-microsoft-com：office：spreadsheet'所以解析器肯定应该等同地对待它们？

<Cell><Data ss:Type="String">QB11128020</Data></Cell>

我的代码：

with (open(filename,'r')) as f:
    doc = etree.parse(f)

namespaces={'o':'urn:schemas-microsoft-com:office:office',
            'x':'urn:schemas-microsoft-com:office:excel',
            'ss':'urn:schemas-microsoft-com:office:spreadsheet'}

ws = doc.xpath('/ss:Workbook/ss:Worksheet', namespaces=namespaces)
if len(ws) > 0: 
    tables = ws[0].xpath('./ss:Table', namespaces=namespaces)
    if len(tables) > 0: 
        rows = tables[0].xpath('./ss:Row', namespaces=namespaces)
        for row in rows:
            cells = row.xpath('./ss:Cell/ss:Data', namespaces=namespaces)
            for cell in cells:
                print(cell.text);
                print(cell.keys());
                print(cell.get('{urn:schemas-microsoft-com:office:spreadsheet}Type'));

Answer 1

根据The lxml.etree Tutorial -- Namespace：

ElementTree API尽可能避免使用名称空间前缀而是部署真实的命名空间（URI）：

BTW，关注

cell.get('{urn:schemas-microsoft-com:office:spreadsheet}Type')

可以写成：

cell.get('{%(ss)s}Type' % namespaces)

或：

cell.get('{{{0[ss]}}}Type'.format(namespaces))

lxml属性需要完整的命名空间

1 个答案: