选择带有beautifulsoup的点的标签

时间:2019-06-03 07:21:34

标签: python xml python-3.x beautifulsoup lxml

如何使用beautifulsoup选择和修改标签<Tagwith.dot>以及其他文本?如果beautifulsoup无法实现,那么用于xml文档编辑和创建的下一个最佳库是lxml?

from bs4 import BeautifulSoup as bs

stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""
soup = bs(stra)

所需的XML:

<body>
<Tagwith.dot>Edited text</Tagwith.dot>
</body>

2 个答案:

答案 0 :(得分:2)

BS4假定并将所有标签转换为小写。下面的代码工作正常。请以小写形式提供标签名称。

this is a replyo from the gmail indbo asdf asdf asdfa sdfa=
sd sdfa sdfa fasd
=C2=A0dfa sf asdf
a sdfas
<= div>f asdf=C2=A0


Is there anything else like some decoding option to parse it correctly.

输出:

from bs4 import BeautifulSoup as bs

stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""
soup = bs(stra, 'html.parser')

print(soup.find_all('tagwith.dot'))

答案 1 :(得分:1)

您可以使用xml.etree.elementtree完成以下操作

import xml.etree.ElementTree as ET

stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""

#Read xml string and convert to xml object
xml_obj = ET.fromstring(stra)

#Iterate through elements
for elem in xml_obj:
    #If tag is found, modify the text
    if elem.tag == 'Tagwith.dot':
        elem.text = 'Edited text'

#Print updated xml object as a string
print(ET.tostring(xml_obj).decode())

输出将为

<body>
<Tagwith.dot>Edited text</Tagwith.dot>
</body>