我有一些看起来像这样的文字:
<item>
<title>What Music Do You Build Robots to?</title>
<dc:creator><![CDATA[@TaranMayer TaranMayer ]]></dc:creator>
<description><![CDATA[ <aside class="quote no-group" data-username="DanMantz" data-post="34" data-topic="84065" data-full="true">
<div class="title">
<div class="quote-controls"></div>
<img alt="" width="20" height="20" src="https://www.vexforum.com/user_avatar/www.vexforum.com/danmantz/40/2285_2.png" class="avatar"> DanMantz:</div>
<blockquote>
<p>Classic Rock and Motown. I didn’t even consider that there are other options… <img src="https://www.vexforum.com/images/emoji/apple/slight_smile.png?v=9" title=":slight_smile:" class="emoji" alt=":slight_smile:"></p>
</blockquote>
</aside>
<p>This implies that you do indeed build robots. May we see some of your creations?</p> ]]></description>
<link>https://www.vexforum.com/t/what-music-do-you-build-robots-to/84065/35</link>
<pubDate>Wed, 02 Sep 2020 17:24:19 +0000</pubDate>
<guid isPermaLink="false">www.vexforum.com-post-669073</guid>
</item>
使用bs4,我想获取<description>
标记中除<blockquote>
标记中的内容以外的所有内容的文本。我想得到这个:
This implies that you do indeed build robots. May we see some of your creations?
我该怎么做?我尝试寻求帮助,但找不到所需的东西。
答案 0 :(得分:1)
要获取所需的文本,可以使用.extract()
方法:
from bs4 import BeautifulSoup, CData
txt = """<item>
<title>What Music Do You Build Robots to?</title>
<dc:creator><![CDATA[@TaranMayer TaranMayer ]]></dc:creator>
<description><![CDATA[ <aside class="quote no-group" data-username="DanMantz" data-post="34" data-topic="84065" data-full="true">
<div class="title">
<div class="quote-controls"></div>
<img alt="" width="20" height="20" src="https://www.vexforum.com/user_avatar/www.vexforum.com/danmantz/40/2285_2.png" class="avatar"> DanMantz:</div>
<blockquote>
<p>Classic Rock and Motown. I didn’t even consider that there are other options… <img src="https://www.vexforum.com/images/emoji/apple/slight_smile.png?v=9" title=":slight_smile:" class="emoji" alt=":slight_smile:"></p>
</blockquote>
</aside>
<p>This implies that you do indeed build robots. May we see some of your creations?</p> ]]></description>
<link>https://www.vexforum.com/t/what-music-do-you-build-robots-to/84065/35</link>
<pubDate>Wed, 02 Sep 2020 17:24:19 +0000</pubDate>
<guid isPermaLink="false">www.vexforum.com-post-669073</guid>
</item>"""
# load main soup:
soup = BeautifulSoup(txt, "html.parser")
# find CData in description
desc = soup.find("description").find_next(text=lambda t: isinstance(t, CData))
# create new soup
desc = BeautifulSoup(desc, "html.parser")
# extract tags we don't want
for a in desc.select("aside"):
a.extract()
# print the text:
print(desc.text.strip())
打印:
This implies that you do indeed build robots. May we see some of your creations?