Question

对于验证目的：如何按节点（甚至是子节点）搜索整个XML节点，如下所示：

XML文件：

<Summary>
<Hardware_Info>
    <HardwareType>FlashDrive</HardwareType>
    <ManufacturerDetail>
            <ManufacturerCompany>Company1</ManufacturerCompany>
            <ManufacturerDate>2017-07-20T12:26:04-04:00</ManufacturerDate>
            <ModelCode>4BR6282</ModelCode>
    </ManufacturerDetail>
    <ActivationDate>2017-07-20T12:26:04-04:00</ActivationDate>
</Hardware_Info>
<DeviceConnectionInfo>
    <Device>
        <Index>0</Index>
        <Name>Laptop1</Name>
        <Status>Installed</Status>
    </Device>
    <Device>
        <Index>1</Index>
        <Name>Laptop2</Name>
        <Status>Installed</Status>
    </Device>
</DeviceConnectionInfo>
</Summary>

并根据特定表的匹配列搜索值。为了示例，表格如下：

表格

HardwareType    ManufacturerCompany    ManufacturerDate             ActivationDate              Device.Index        Name
FlashDrive      Company1               2017-07-20T12:26:04-04:00    2017-07-20T12:26:04-04:00   0                   Laptop1
FlashDrive      Company2               2017-07-20T12:26:04-04:00    2017-07-20T12:26:04-04:00   1                   Laptop2

在这种情况下，我会有一个列列表：

HardwareType, ManufacturerCompany, ManufacturerDate, ActivationDate, Device.Index, Name

对于我的最终结果，我想打印表列名的值以及在xml上找到的表名的值。例如类似于原始表（假设验证很好）：

输出结果：

 HardwareType   ManufacturerCompany    ManufacturerDate             ActivationDate              Device.Index        Name
    FlashDrive      Company1               2017-07-20T12:26:04-04:00    2017-07-20T12:26:04-04:00   0                   Laptop1
    FlashDrive      Company2               2017-07-20T12:26:04-04:00    2017-07-20T12:26:04-04:00   1                   Laptop2

当前的实施：

例如，我能够获取表的列名列表，但到目前为止，我最好的实现这一点的是：

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("/test.xml")
root = tree.getroot()

f = open('/test.csv', 'w')

csvwriter = csv.writer(f)

count = 0

head = ['ManufacturerCompany','ManufacturerDate',...]

csvwriter.writerow(head)

for time in root.findall('Summary'):
     row = []
     job_name = time.find('ManufacturerDetail').find('ManufacturerCompany').text
     row.append(job_name)
     job_name = time.find('ManufacturerDetail').find('ManufacturerDate').text
     row.append(job_name)
     csvwriter.writerow(row)
f.close()

但是，这个实现没有循环我想要输出的每个功能。任何实施的指导或建议都会很棒。

由于

Answer 1

考虑XSLT，这是专门用于将XML文件转换为其他XML，HTML（主要用于）的专用语言，还包括用method="text"转换的文本文件（TXT / CSV）。具体来说，向下走到设备节点级别并引入祖先项目。

Python的第三方lxml模块可以运行XSLT 1.0脚本。但是，XSLT是可移植的，任何 XSLT processor都可以运行这样的代码，包括Unix（Linux / Mac）可用的xsltproc。

XSLT （另存为.xsl文件，一个特殊的.xml文件; 
是换行符实体）

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="delimiter">,</xsl:param>

  <xsl:template match="/Summary">
    <xsl:text>HardwareType,ManufacturerCompany,ManufacturerDate,ActivationDate,Device.Index,Name&#xa;</xsl:text>    
    <xsl:apply-templates select="DeviceConnectionInfo"/>    
  </xsl:template>

  <xsl:template match="DeviceConnectionInfo">
    <xsl:apply-templates select="Device"/>    
  </xsl:template>

  <xsl:template match="Device">
    <xsl:value-of select="concat(ancestor::Summary/Hardware_Info/HardwareType, $delimiter,
                                 ancestor::Summary/Hardware_Info/ManufacturerDetail/ManufacturerCompany, $delimiter,
                                 ancestor::Summary/Hardware_Info/ManufacturerDetail/ManufacturerDate, $delimiter,
                                 ancestor::Summary/Hardware_Info/ActivationDate, $delimiter,
                                 Index, $delimiter,
                                 Name)"/><xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python （使用lxml）

import lxml.etree as et

# LOAD XML AND XSL
doc = et.parse('input.xml')
xsl = et.parse('xslt_script.xsl')

# TRANSFORM INPUT TO STRING
transform = et.XSLT(xsl)    
result = str(transform(doc))

# SAVE TO FILE
with open('output.csv', 'w') as f:
    f.write(result)

Python （对xsltproc的单行命令调用）

from subprocess import Popen

proc = Popen(['xsltproc -o output.csv xslt_script.xsl input.xml'], 
             shell=True, cwd='/path/to/working/directory')

<强>输出

# HardwareType  ManufacturerCompany ManufacturerDate    ActivationDate  Device.Index    Name
# FlashDrive    Company1    2017-07-20T12:26:04-04:00   2017-07-20T12:26:04-04:00   0   Laptop1
# FlashDrive    Company1    2017-07-20T12:26:04-04:00   2017-07-20T12:26:04-04:00   1   Laptop2

Python：如何遍历每个XML节点并根据列表打印值

1 个答案: