我正在使用运行python 3的Jupyer笔记本。我的任务是从XML文件提取数据并将其转换为json格式(也许甚至将json保存在output.dat文件中)。我正在使用BeautifulSoup在节点之间导航。我有以下数据:
fetch('http://roomarket.ir/LlIi1/CT.php')
.then((response) => response.json())
.then((responseData) => {
this.setState({response:responseData. ENtime},() => Alert.alert(this.state.response))
这是我期望在JSON中显示的以下输出:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
我正在浏览标签,以便可以创建字典,如输出示例所示。由于我不熟悉文字抓取功能,因此非常令人沮丧。
我能够使用以下代码提取“描述”标签:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
就像上面的Description标签一样,我不确定如何为“ RelatedTerms”标签之间存储的信息创建词典列表。 理想情况下,我会将所有标签解析为一个数据框,然后将数据转换为JSON格式。
所以,有人可以帮忙确定如何从“ RelatedTerms”标签中提取信息。
答案 0 :(得分:1)
要提取RelatedTerms
,首先必须使用Term
提取顶部的btree.select('Terms > Term')
元素,现在您可以对其进行循环,并使用{{1 }}
Term