我下面有XML,我已经保存在名为movies.xml的文件中。我只需要将某些值转换为JSON。对于直接转换,我可以使用xmltodict。我正在使用etree和etree.XMLParser()。我尝试在此之后进行弹性搜索。我已经使用attrib方法成功提取了单个节点。
<?xml version="1.0" encoding="UTF-8" ?>
<collection>
<genre category="Action">
<decade years="1980s">
<movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
<format multiple="No">DVD</format>
<year>1981</year>
<rating>PG</rating>
<description>
'Archaeologist and adventurer Indiana Jones
is hired by the U.S. government to find the Ark of the
Covenant before the Nazis.'
</description>
</movie>
<movie favorite="True" title="THE KARATE KID">
<format multiple="Yes">DVD,Online</format>
<year>1984</year>
<rating>PG</rating>
<description>None provided.</description>
</movie>
<movie favorite="False" title="Back 2 the Future">
<format multiple="False">Blu-ray</format>
<year>1985</year>
<rating>PG</rating>
<description>Marty McFly</description>
</movie>
</decade>
<decade years="1990s">
<movie favorite="False" title="X-Men">
<format multiple="Yes">dvd, digital</format>
<year>2000</year>
<rating>PG-13</rating>
<description>Two mutants come to a private academy for their kind whose resident superhero team must
oppose a terrorist organization with similar powers.</description>
</movie>
<movie favorite="True" title="Batman Returns">
<format multiple="No">VHS</format>
<year>1992</year>
<rating>PG13</rating>
<description>NA.</description>
</movie>
<movie favorite="False" title="Reservoir Dogs">
<format multiple="No">Online</format>
<year>1992</year>
<rating>R</rating>
<description>WhAtEvER I Want!!!?!</description>
</movie>
</decade>
</genre>
<genre category="Thriller">
<decade years="1970s">
<movie favorite="False" title="ALIEN">
<format multiple="Yes">DVD</format>
<year>1979</year>
<rating>R</rating>
<description>"""""""""</description>
</movie>
</decade>
<decade years="1980s">
<movie favorite="True" title="Ferris Bueller's Day Off">
<format multiple="No">DVD</format>
<year>1986</year>
<rating>PG13</rating>
<description>Funny movie about a funny guy</description>
</movie>
<movie favorite="FALSE" title="American Psycho">
<format multiple="No">blue-ray</format>
<year>2000</year>
<rating>Unrated</rating>
<description>psychopathic Bateman</description>
</movie>
</decade>
</genre>
</collection>
我想要的输出低于
First output {'Action':['Indiana Jones: The raiders of the lost Ark', 'THE KARATE KID', 'Back 2 the Future','X-Men', 'Batman Returns', 'Reservoir Dogs']}
second output {'movies':'description'}
third output {'movies': 'year'}
我已经完成了数据营的基本操作,无法获得所需的输出
from lxml import etree
parser = etree.XMLParser()
tree= etree.parse('movies.xml', parser)
data= tree.find("genre[@category='Action']")
json= {}
for child in enumerate(data.getchildren()):
temp = {}
for content in child[1].getchildren():
temp[content.attrib.get('title')] = content.text.strip()
json[child[0]] = temp.keys()
json
答案 0 :(得分:4)
我建议使用XSLT将XML转换为JSON:
import json
from lxml import etree
XSL = '''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text"/>
<xsl:template match="/collection">
<xsl:text>{</xsl:text>
<xsl:apply-templates/>
<xsl:text>}</xsl:text>
</xsl:template>
<xsl:template match="genre">
<xsl:text>"</xsl:text>
<xsl:value-of select="@category"/>
<xsl:text>": [</xsl:text>
<xsl:for-each select="descendant::movie" >
<xsl:text>"</xsl:text>
<xsl:value-of select="@title"/>
<xsl:text>"</xsl:text>
<xsl:if test="position() != last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>]</xsl:text>
<xsl:if test="following-sibling::*">
<xsl:text>,
</xsl:text>
</xsl:if>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>'''
# load input
dom = etree.parse('movies.xml')
# load XSLT
transform = etree.XSLT(etree.fromstring(XSL))
# apply XSLT on loaded dom
json_text = str(transform(dom))
# json_text contains the data converted to JSON format.
# you can use it with the JSON API. Example:
data = json.loads(json_text)
print(data)
输出:
{'Action': ['Indiana Jones: The raiders of the lost Ark', 'THE KARATE KID', 'Back 2 the Future', 'X-Men', 'Batman Returns', 'Reservoir Dogs'], 'Thriller': ['ALIEN', "Ferris Bueller's Day Off", 'American Psycho']}
不过,由于这些输出似乎是常量,因此我不理解要使用“第二输出”和“第三输出”实现什么。