Question

我有一个格式如下的XML文件：

<Main1>
  <Sub1>
    <Name>Test</Name>
    <ID>12345</ID>
    <Sub2>
      <Prop>
        <Key>A</Key>
        <Value>Apple</Value>
      </Prop>
      <Prop>
        <Key>B</Key>
        <Value>Ball</Value>
      </Prop>
    </Sub2>
    <Sub3>
      <Order>
        <OID>54321</OID>
        <ODate>2016-01-01</ODate>
      </Order>
    </Sub3>
  </Sub1>
</Main1>

我正在尝试让python导入这个xml并将其拆分为三个不同的文件：人名和id的文件，属性的文件以及订单信息的文件。但是，当我拆分它时，我想将客户ID添加到属性和订单文件中。因此属性文件可能最终看起来像：

<Orders>
  <Order>
    <ID>12345</ID>
    <OID>54321</OID>
    <ODate>2016-01-01</ODate>
  </Order>
</Orders>

Answer 1

使用lxml和element.xpath()选择所需的节点，并根据需要将它们附加到新XML文档中的节点。

XPath不是lxml引入的概念，而是一种通用查询语言，用于从许多处理XML的事物支持的XML文档中选择节点。可以把它想象成与CSS选择器类似的东西，但功能更强大（也有点复杂）。请参阅XPath Syntax。

所以，例如，

tree.xpath('/Main1/Sub1')

将选择<Sub1 />节点正下方的<Main1 />元素。

请注意，.xpath()始终会返回所选节点的列表 - 因此，如果您只想要一个节点，请注明该节点。

所以，这样的事情应该有效：

from copy import copy
from lxml import etree


def parse(filename):
    parser = etree.XMLParser(remove_blank_text=True)
    root = etree.parse(open(filename), parser=parser)
    return root


def dump_to_file(root, filename_base, id_):
    customer_id = id_.text.strip()
    filename = '%s-%s.xml' % (filename_base, customer_id)
    with open(filename, 'w') as xml_file:
        etree.ElementTree(root).write(xml_file, pretty_print=True)


def dump_orders(id_, orders):
    root = etree.XML('<Orders/>')
    for order in orders:
        order.append(copy(id_))
        root.append(order)
    dump_to_file(root, 'orders', id_)


def dump_properties(id_, properties):
    root = etree.XML('<Properties/>')
    for prop in properties:
        prop.append(copy(id_))
        root.append(prop)
    dump_to_file(root, 'properties', id_)


def dump_customer(id_, name):
    root = etree.XML('<Customer/>')
    root.append(copy(id_))
    root.append(copy(name))
    dump_to_file(root, 'customer', id_)


root = parse('complete.xml')
customers = root.xpath('/Main1/Sub1')

for customer in customers:
    name = customer.xpath('./Name')[0]
    id_ = customer.xpath('./ID')[0]
    dump_customer(id_, name)

    properties = customer.xpath('./Sub2/Prop')
    dump_properties(id_, properties)

    orders = customer.xpath('./Sub3/Order')
    dump_orders(id_, orders)

这将为每个客户创建三个这样的文件：

<强> customer-12345.xml

<Customer>
  <ID>12345</ID>
  <Name>Test</Name>
</Customer>

<强> orders-12345.xml

<Orders>
  <Order>
    <OID>54321</OID>
    <ODate>2016-01-01</ODate>
    <ID>12345</ID>
  </Order>
</Orders>

<强> properties-12345.xml

<Properties>
  <Prop>
    <Key>A</Key>
    <Value>Apple</Value>
    <ID>12345</ID>
  </Prop>
  <Prop>
    <Key>B</Key>
    <Value>Ball</Value>
    <ID>12345</ID>
  </Prop>
</Properties>

有关XPath语法的详细信息，请参阅XPath Syntax中的W3Schools Xpath Tutorial页面。

要开始使用XPath，在XPath testers之一的文档中摆弄文档也非常有用。

从一个

1 个答案: