Question

我有一个看起来像这样的xml

<?xml version='1.0' encoding='utf8'?>
<all>
<articletitle>text1<x> </x></articletitle>
<affiliation><x> </x><label id="aff1">12</label><affnorg>College of Materials Science and Engineering</affnorg><x>, </x><affnorg>Guangdong Research Center for Interfacial Engineering of Functional Materials</affnorg><x>, </x><affnorg>Shenzhen University</affnorg><x>, </x><affnadd>3688 Nanhai Ave</affnadd><x>, </x><affncity>Shenzhen</affncity><x>, </x><affnpost>518060</affnpost><x>, </x><affncountry>PR China</affncountry><x>.</x></affiliation>
<affiliation><x> </x><label id="aff2">2</label><affnorg>Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province</affnorg><x>, </x><affnorg>College of Optoelectronic Engineering</affnorg><x>, </x><affnorg>Shenzhen University</affnorg><x>, </x><affnadd>3688 Nanhai Ave</affnadd><x>, </x><affncity>Shenzhen</affncity><x>, </x><affnpost>518060</affnpost><x>, </x><affncountry>PR China</affncountry><x>.</x></affiliation>
</all>

任务是我必须删除所有<x>标签并将它们的文本仅保留在affiliation标签中，使用ElementTree可以删除标签，但是它也将删除文本，但是我想要该文本位于父标记中，所以我的新xml看起来像这样

<?xml version='1.0' encoding='utf8'?>
<all>
<articletitle>text1<x> </x></articletitle>
<affiliation> <label id="aff1">12</label><affnorg>College of Materials Science and Engineering</affnorg>, <affnorg>Guangdong Research Center for Interfacial Engineering of Functional Materials</affnorg>, <affnorg>Shenzhen University</affnorg>, <affnadd>3688 Nanhai Ave</affnadd>, <affncity>Shenzhen</affncity>, <affnpost>518060</affnpost>, <affncountry>PR China</affncountry>.</affiliation>
<affiliation> <label id="aff2">2</label><affnorg>Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province</affnorg>, <affnorg>College of Optoelectronic Engineering</affnorg>, <affnorg>Shenzhen University</affnorg>, <affnadd>3688 Nanhai Ave</affnadd>, <affncity>Shenzhen</affncity>, <affnpost>518060</affnpost>, <affncountry>PR China</affncountry>.</affiliation>
</all>

Answer 1

通过BeautifulSoup，您可以使用unwrap()函数：

data = '''<?xml version='1.0' encoding='utf8'?>
<all>
<articletitle>text1<x> </x></articletitle>
<affiliation><x> </x><label id="aff1">12</label><affnorg>College of Materials Science and Engineering</affnorg><x>, </x><affnorg>Guangdong Research Center for Interfacial Engineering of Functional Materials</affnorg><x>, </x><affnorg>Shenzhen University</affnorg><x>, </x><affnadd>3688 Nanhai Ave</affnadd><x>, </x><affncity>Shenzhen</affncity><x>, </x><affnpost>518060</affnpost><x>, </x><affncountry>PR China</affncountry><x>.</x></affiliation>
<affiliation><x> </x><label id="aff2">2</label><affnorg>Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province</affnorg><x>, </x><affnorg>College of Optoelectronic Engineering</affnorg><x>, </x><affnorg>Shenzhen University</affnorg><x>, </x><affnadd>3688 Nanhai Ave</affnadd><x>, </x><affncity>Shenzhen</affncity><x>, </x><affnpost>518060</affnpost><x>, </x><affncountry>PR China</affncountry><x>.</x></affiliation>
</all>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data,'xml')

for x in soup.select('affiliation x'):
    x.unwrap()

print(soup)

打印：

<?xml version="1.0" encoding="utf-8"?>
<all>
<articletitle>text1<x> </x></articletitle>
<affiliation> <label id="aff1">12</label><affnorg>College of Materials Science and Engineering</affnorg>, <affnorg>Guangdong Research Center for Interfacial Engineering of Functional Materials</affnorg>, <affnorg>Shenzhen University</affnorg>, <affnadd>3688 Nanhai Ave</affnadd>, <affncity>Shenzhen</affncity>, <affnpost>518060</affnpost>, <affncountry>PR China</affncountry>.</affiliation>
<affiliation> <label id="aff2">2</label><affnorg>Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province</affnorg>, <affnorg>College of Optoelectronic Engineering</affnorg>, <affnorg>Shenzhen University</affnorg>, <affnadd>3688 Nanhai Ave</affnadd>, <affncity>Shenzhen</affncity>, <affnpost>518060</affnpost>, <affncountry>PR China</affncountry>.</affiliation>
</all>

删除子标签，但将文本保留在xml中？

1 个答案: