这是对上一个问题的跟进:Write xml with a path and value。我现在想要添加两个额外的东西:1)属性和2)具有父节点的多个项目。这是我的路径列表:
[
{'Path': 'Item/Info/Name', 'Value': 'Body HD'},
{'Path': 'Item/Info/Synopsis', 'Value': 'A great movie'},
{'Path': 'Item/Locales/Locale[@Country="US"][@Language="ES"]/Name', 'Value': 'El Grecco'},
{'Path': 'Item/Genres/Genre', 'Value': 'Action'},
{'Path': 'Item/Genres/Genre', 'Value': 'Drama'},
{'Path': 'Item/Purchases/Purchase[@Country="US"]/HDPrice', 'Value': '10.99'},
{'Path': 'Item/Purchases/Purchase[@Country="US"]/SDPrice', 'Value': '9.99'},
{'Path': 'Item/Purchases/Purchase[@Country="CA"]/SDPrice', 'Value': '4.99'},
]
它应该生成的xml是:
<Item>
<Info>
<Name>Body HD</Name>
<Synopsis>A great movie</Synopsis>
</Info>
<Locales>
<Locale Country="US" Language="ES">
<Name>El Grecco</Name>
</Locale>
</Locales>
<Genres>
<Genre>Action</Genre>
<Genre>Drama</Genre>
</Genres>
<Purchases>
<Purchase Country="US">
<HDPrice>10.99</HDPrice>
<SDPrice>9.99</SDPrice>
</Purchase>
<Purchase Country="CA">
<SDPrice>4.99</SDPrice>
</Purchase>
</Purchases>
</Item>
我如何构建它?
答案 0 :(得分:1)
要从xpaths和值构建XML树,我使用RegEx和lxml
:
import re
from lxml import etree
条目是:
entries = [
{'Path': 'Item/Info/Name', 'Value': 'Body HD'},
{'Path': 'Item/Info/Synopsis', 'Value': 'A great movie'},
{'Path': 'Item/Locales/Locale[@Country="US"][@Language="ES"]/Name', 'Value': 'El Grecco'},
{'Path': 'Item/Genres/Genre', 'Value': 'Action'},
{'Path': 'Item/Genres/Genre', 'Value': 'Drama'},
{'Path': 'Item/Purchases/Purchase[@Country="US"]/HDPrice', 'Value': '10.99'},
{'Path': 'Item/Purchases/Purchase[@Country="US"]/SDPrice', 'Value': '9.99'},
{'Path': 'Item/Purchases/Purchase[@Country="CA"]/SDPrice', 'Value': '4.99'},
]
要解析每个xpath步骤,我使用以下RegEx(非常简单):
TAG_REGEX = r"(?P<tag>\w+)"
CONDITION_REGEX = r"(?P<condition>(?:\[.*?\])*)"
STEP_REGEX = TAG_REGEX + CONDITION_REGEX
ATTR_REGEX = r"@(?P<key>\w+)=\"(?P<value>.*?)\""
search_step = re.compile(STEP_REGEX, flags=re.DOTALL).search
findall_attr = re.compile(ATTR_REGEX, flags=re.DOTALL).findall
def parse_step(step):
mo = search_step(step)
if mo:
tag = mo.group("tag")
condition = mo.group("condition")
return tag, dict(findall_attr(condition))
raise ValueError(xpath)
parse_step
会返回标记名称和属性字典。
然后,我以相同的方式构建XML树:
root = None
for entry in entries:
path = entry["Path"]
parts = path.split("/")
xpath_list = ["/" + parts[0]] + parts[1:]
curr = root
for xpath in xpath_list:
tag_name, attrs = parse_step(xpath)
if curr is None:
root = curr = etree.Element(tag_name, **attrs)
else:
nodes = curr.xpath(xpath)
if nodes:
curr = nodes[0]
else:
curr = etree.SubElement(curr, tag_name, **attrs)
if curr.text:
curr = etree.SubElement(curr.getparent(), curr.tag, **curr.attrib)
curr.text = entry["Value"]
print(etree.tostring(root, pretty_print=True))
结果是:
<Item>
<Info>
<Name>Body HD</Name>
<Synopsis>A great movie</Synopsis>
</Info>
<Locales>
<Locale Country="US" Language="ES">
<Name>El Grecco</Name>
</Locale>
</Locales>
<Genres>
<Genre>Action</Genre>
<Genre>Drama</Genre>
</Genres>
<Purchases>
<Purchase Country="US">
<HDPrice>10.99</HDPrice>
<SDPrice>9.99</SDPrice>
</Purchase>
<Purchase Country="CA">
<SDPrice>4.99</SDPrice>
</Purchase>
</Purchases>
</Item>