Question

我有一个Excel XML文件，我需要获取元素的样式ID，其中有一个确定颜色的单元格（内部）。

我有这个Excel xml，例如：

这是文件的标题：

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
  xmlns:o="urn:schemas-microsoft-com:office:office"
  xmlns:x="urn:schemas-microsoft-com:office:excel"
  xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
  xmlns:html="http://www.w3.org/TR/REC-html40">
  <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">

这就是我需要访问的内容：

<Style ss:ID="s64">
   <Interior ss:Color="#00CC00" ss:Pattern="Solid"/>
</Style>

我需要编写一个函数，传递颜色＃00CC00我得到这个元素，然后我可以访问它的父代来获取ID。

我已尝试使用此代码，但无效。我想我应该使用命名空间。

parser = et.parse(str(file))
color = parser.xpath("//interior[@ss:Color='#FFCC00'")
par = color.getparent()
print(par)

我需要代码返回“s64”。

但它不是有效的代码。我缺少什么？

编辑：我想编辑我的问题并添加一些额外的信息，在查找了我写过这段代码的更多信息后

def _find_color(self):
    """
    Find the color in the xml file and returns the attribute.
    """
    print('The folder is: ', self.path)
    nsd ={'Default':'urn:schemas-microsoft-com:office:spreadsheet',
                'o': 'urn:schemas-microsoft-com:office:office', 
                'ss': 'urn:schemas-microsoft-com:office:spreadsheet'}
    if pathlib.Path(self.path).exists():
        for file in self.folder.glob('**/*.xml'):
            print('The file is ', file)
            parser = et.parse(str(file))
            color = parser.xpath("//style/interior[@ss:Color='#00CC00']",namespaces=nsd)
            print(color)
            #par = color.getparent()
            #print(par)

然而，它返回一个空列表。所以它找不到任何东西。

添加我感兴趣的整个源部分

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
  xmlns:o="urn:schemas-microsoft-com:office:office"
  xmlns:x="urn:schemas-microsoft-com:office:excel"
  xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
  xmlns:html="http://www.w3.org/TR/REC-html40">
  <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
     <Author>Somebody</Author>
     <LastAuthor>Somebody</LastAuthor>
     <Created>2016-05-16T10:44:52Z</Created>
     <Company>SomeCompany</Company>
     <Version>12.00</Version>
  </DocumentProperties>
  <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
     <WindowHeight>9495</WindowHeight>
     <WindowWidth>20835</WindowWidth>
     <WindowTopX>240</WindowTopX>
     <WindowTopY>420</WindowTopY>
     <ProtectStructure>False</ProtectStructure>
     <ProtectWindows>False</ProtectWindows>
  </ExcelWorkbook>
  <Styles>
    <Style ss:ID="Default" ss:Name="Normal">
      <Alignment ss:Vertical="Bottom"/>
      <Borders/>
      <Font ss:FontName="Arial" x:Family="Swiss"/>
      <Interior/>
      <NumberFormat/>
      <Protection/>
    </Style>
    <Style ss:ID="s63">
      <Font ss:FontName="Arial" x:Family="Swiss" ss:Color="#FF0000" ss:Bold="1"/>
    </Style>
    <Style ss:ID="s64">
      <Interior ss:Color="#00CC00" ss:Pattern="Solid"/>
    </Style>
    <Style ss:ID="s65">
       <Font ss:FontName="Arial" x:Family="Swiss" ss:Color="#FF0000" ss:Bold="1"/>
     <Interior ss:Color="#44CF00" ss:Pattern="Solid"/>
      </Style>
   </Styles>

我无法使用xpath找到基于属性的元素。

Answer 1

以下是如何做到的。

std::vector<vec<int,10>> asd;

输出：

from lxml import etree as ET

NS = {"ss": "urn:schemas-microsoft-com:office:spreadsheet"}

tree = ET.parse("workbook.xml")
interior = tree.find("//ss:Style/ss:Interior[@ss:Color='#00CC00']", namespaces=NS)
print(interior.getparent().get("{urn:schemas-microsoft-com:office:spreadsheet}ID"))

评论：

必须在所有元素上使用s64前缀。
XML区分大小写（ss！= Style）。
获取命名空间style属性的值时，必须使用URI（不是前缀）。

Answer 2

寻找后我终于找到了解决方案。看起来其中一个错误是我没有生成树（我用getroot（）解决了这个问题）所以我的解决方案是：

def _find_color(self):
    """
    Find the color in the xml file and returns the attribute.
    """
    print('The folder is: ', self.path)
    nsd ={'Default':'urn:schemas-microsoft-com:office:spreadsheet',
                'o': 'urn:schemas-microsoft-com:office:office', 
                'ss': 'urn:schemas-microsoft-com:office:spreadsheet'}
    if pathlib.Path(self.path).exists():
        for file in self.folder.glob('**/*.xml'):
            print('The file is ', file)
            parser = et.parse(str(file))
            root=parser.getroot()
            color = root.xpath("//Default:Interior[@ss:Color='#FFCC00']",namespaces=nsd)
  print(color)
            for element in color:
                print('Tag: ', element.tag, 'Attribute: ', element.attrib)
                par_id= element.getparent().get("{urn:schemas-microsoft-com:office:spreadsheet}ID")
                print(par_id)

返回s64。

对于获取父母身份的部分，我使用了mzjn为我提供的解决方案。我知道我必须使用URI而不是短名称。

使用python lxml获取Excel xml的父属性

2 个答案: