我有以下字符串我试图提取:
<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>
我试图将Chris M和其他作者的名字改为:
soup = BeautifulSoup(response, "lxml")
items = soup.findAll("item")
for i in items:
author = i.find('dc:creator')
print author
输出:
<dc:creator></dc:creator>
如何从标签中获取名称内容?
答案 0 :(得分:0)
这对我使用Python 3 - https://repl.it/languages/python3
起作用了将解析器指定为xml
import bs4 as bs
content="""
<collection>
<item><dc:creator><![CDATA[Chris M]]></dc:creator></item>
<item><dc:creator><![CDATA[Harris A]]></dc:creator></item>
</collection>
"""
soup = bs.BeautifulSoup(content, 'xml')
items = soup.findAll("item")
for i in items:
author = i.find('creator')
print(author.string)
输出:
Chris M
Harris A
答案 1 :(得分:0)
BeautifulSoup将CData识别为子类,以便您可以检查它的实例。
>>> from bs4 import BeautifulSoup, CData
>>> text = """<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>"""
>>> soup = BeautifulSoup(text)
>>> for item in soup.findAll(text=True):
if isinstance(item, CData):
print(item)
Chris M