使用BeautifulSoup使用XML文件创建JSON

时间:2018-11-10 09:35:29

标签: json xml beautifulsoup

我正在使用运行python 3的Jupyer笔记本。我的任务是从XML文件提取数据并将其转换为json格式(也许甚至将json保存在output.dat文件中)。我正在使用BeautifulSoup在节点之间导航。我有以下数据:

fetch('http://roomarket.ir/LlIi1/CT.php')
  .then((response) => response.json())
    .then((responseData) => {
      this.setState({response:responseData. ENtime},() => Alert.alert(this.state.response))

这是我期望在JSON中显示的以下输出:

<?xml version='1.0' encoding='UTF-8'?> 
<Terms>   
 <Term>
    <Title>.177 (4.5mm) Airgun</Title>
    <Description>The standard airgun calibre for international target 
                 shooting.</Description>
    <RelatedTerms>
      <Term>
        <Title>Shooting sport equipment</Title>
        <Relationship>Narrower Term</Relationship>
      </Term>
    </RelatedTerms>   
 </Term>
 <Term>
    <Title>1 Kilometre Time Trial</Title>
    <Description>test2</Description>
    <RelatedTerms>
    <Term>
      <Title>1 Kilometre TT</Title>
      <Relationship>Used For</Relationship>
    </Term>
    <Term>
      <Title>1km Time Trial</Title>
    <Relationship>Used For</Relationship>
  </Term>
  <Term>
    <Title>1km Time Trial</Title>
    <Relationship>Used For</Relationship>
  </Term>
  <Term>
    <Title>1km TT</Title>
    <Relationship>Used For</Relationship>
  </Term>
  <Term>
    <Title>One km Time Trial</Title>
    <Relationship>Used For</Relationship>
  </Term>
</RelatedTerms>
</Term>

我正在浏览标签,以便可以创建字典,如输出示例所示。由于我不熟悉文字抓取功能,因此非常令人沮丧。

我能够使用以下代码提取“描述”标签:

{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
}, 

{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},

就像上面的Description标签一样,我不确定如何为“ RelatedTerms”标签之间存储的信息创建词典列表。 理想情况下,我会将所有标签解析为一个数据框,然后将数据转换为JSON格式。

所以,有人可以帮忙确定如何从“ RelatedTerms”标签中提取信息。

1 个答案:

答案 0 :(得分:1)

要提取RelatedTerms,首先必须使用Term提取顶部的btree.select('Terms > Term')元素,现在您可以对其进行循环,并使用{{1 }}

Term