从路径/值列表中写入xml

时间:2016-08-16 20:52:26

标签: python xml xpath lxml

这是对上一个问题的跟进:Write xml with a path and value。我现在想要添加两个额外的东西:1)属性和2)具有父节点的多个项目。这是我的路径列表:

[
  {'Path': 'Item/Info/Name', 'Value': 'Body HD'},
  {'Path': 'Item/Info/Synopsis', 'Value': 'A great movie'},
  {'Path': 'Item/Locales/Locale[@Country="US"][@Language="ES"]/Name', 'Value': 'El Grecco'},      
  {'Path': 'Item/Genres/Genre', 'Value': 'Action'},
  {'Path': 'Item/Genres/Genre', 'Value': 'Drama'},
  {'Path': 'Item/Purchases/Purchase[@Country="US"]/HDPrice', 'Value': '10.99'},
  {'Path': 'Item/Purchases/Purchase[@Country="US"]/SDPrice', 'Value': '9.99'},
  {'Path': 'Item/Purchases/Purchase[@Country="CA"]/SDPrice', 'Value': '4.99'},
]

它应该生成的xml是:

<Item>
    <Info>
        <Name>Body HD</Name>
        <Synopsis>A great movie</Synopsis>
    </Info>
    <Locales>
        <Locale Country="US" Language="ES">
            <Name>El Grecco</Name>
        </Locale>
    </Locales>
    <Genres>
        <Genre>Action</Genre>
        <Genre>Drama</Genre>
    </Genres>
    <Purchases>
        <Purchase Country="US">
            <HDPrice>10.99</HDPrice>
            <SDPrice>9.99</SDPrice>
        </Purchase>
        <Purchase Country="CA">
            <SDPrice>4.99</SDPrice>
        </Purchase>
    </Purchases>
</Item>

我如何构建它?

1 个答案:

答案 0 :(得分:1)

要从xpaths和值构建XML树,我使用RegEx和lxml

import re

from lxml import etree

条目是:

entries = [
    {'Path': 'Item/Info/Name', 'Value': 'Body HD'},
    {'Path': 'Item/Info/Synopsis', 'Value': 'A great movie'},
    {'Path': 'Item/Locales/Locale[@Country="US"][@Language="ES"]/Name', 'Value': 'El Grecco'},
    {'Path': 'Item/Genres/Genre', 'Value': 'Action'},
    {'Path': 'Item/Genres/Genre', 'Value': 'Drama'},
    {'Path': 'Item/Purchases/Purchase[@Country="US"]/HDPrice', 'Value': '10.99'},
    {'Path': 'Item/Purchases/Purchase[@Country="US"]/SDPrice', 'Value': '9.99'},
    {'Path': 'Item/Purchases/Purchase[@Country="CA"]/SDPrice', 'Value': '4.99'},
]

要解析每个xpath步骤,我使用以下RegEx(非常简单):

TAG_REGEX = r"(?P<tag>\w+)"
CONDITION_REGEX = r"(?P<condition>(?:\[.*?\])*)"
STEP_REGEX = TAG_REGEX + CONDITION_REGEX
ATTR_REGEX = r"@(?P<key>\w+)=\"(?P<value>.*?)\""

search_step = re.compile(STEP_REGEX, flags=re.DOTALL).search
findall_attr = re.compile(ATTR_REGEX, flags=re.DOTALL).findall


def parse_step(step):
    mo = search_step(step)
    if mo:
        tag = mo.group("tag")
        condition = mo.group("condition")
        return tag, dict(findall_attr(condition))
    raise ValueError(xpath)

parse_step会返回标记名称属性字典

然后,我以相同的方式构建XML树:

root = None
for entry in entries:
    path = entry["Path"]
    parts = path.split("/")
    xpath_list = ["/" + parts[0]] + parts[1:]
    curr = root
    for xpath in xpath_list:
        tag_name, attrs = parse_step(xpath)
        if curr is None:
            root = curr = etree.Element(tag_name, **attrs)
        else:
            nodes = curr.xpath(xpath)
            if nodes:
                curr = nodes[0]
            else:
                curr = etree.SubElement(curr, tag_name, **attrs)
    if curr.text:
        curr = etree.SubElement(curr.getparent(), curr.tag, **curr.attrib)
    curr.text = entry["Value"]

print(etree.tostring(root, pretty_print=True))

结果是:

<Item>
  <Info>
    <Name>Body HD</Name>
    <Synopsis>A great movie</Synopsis>
  </Info>
  <Locales>
    <Locale Country="US" Language="ES">
      <Name>El Grecco</Name>
    </Locale>
  </Locales>
  <Genres>
    <Genre>Action</Genre>
    <Genre>Drama</Genre>
  </Genres>
  <Purchases>
    <Purchase Country="US">
      <HDPrice>10.99</HDPrice>
      <SDPrice>9.99</SDPrice>
    </Purchase>
    <Purchase Country="CA">
      <SDPrice>4.99</SDPrice>
    </Purchase>
  </Purchases>
</Item>