如何提取与外部标签同名的内部嵌套标签?

时间:2019-11-03 07:41:08

标签: python python-3.x xml

我是数据科学的新手,希望能将您的意见输入到此查询中。当我解析并尝试使用findall()作为“标题”时,我将获得Title的所有值。我真正想要的是RelatedTerms中“ Title”标签的值。

有人可以帮忙吗? 谢谢,

<?xml version="1.0" encoding="utf-8"?>
<Terms>
    <Term>
        <Title>.177 (4.5mm) Airgun</Title>
        <Description>The standard airgun calibre for international target shooting.
        </Description>
        <RelatedTerms>
            <Term>
                <Title>Shooting sport equipment</Title>
                <Relationship>Narrower Term</Relationship>
            </Term>
        </RelatedTerms>
    </Term>
</Terms>

2 个答案:

答案 0 :(得分:0)

使用beautifulsoup:

from bs4 import BeautifulSoup
temp  ="""<Terms>
            <Term>
            <Title>.177 (4.5mm) Airgun</Title>
            <Description>The standard airgun calibre for international target shooting. 
            </Description>
            <RelatedTerms>
            <Term>
            <Title>Shooting sport equipment</Title>
            <Relationship>Narrower Term</Relationship>
            </Term>
            </RelatedTerms>
            </Term>"""

temp=BeautifulSoup(temp,"lxml")
#see caps is off
s = temp.find('relatedterms')
print(s.find_all('title'))

输出:

[<title>Shooting sport equipment</title>]
[Finished in 1.2s]

答案 1 :(得分:0)

使用 xml.etree.ElementTree

import xml.etree.ElementTree as ET

tree = ET.parse("file.xml")  # Replace "file.xml" with the name of your XML file
root = tree.getroot()

for related_terms in root.findall("./Term/RelatedTerms"):
    for title_internal in related_terms.findall("./Term/Title"):
        print(title_internal.text)

输出:

Shooting sport equipment

tree = ET.parse("test.xml")中的 file.xml 替换为XML文件的名称。