对不起。 我问过这样的问题。 之后我仍然有关于不在标签中的数据的问题。 我问的几个不同的问题 (How can i crawl web data that not in tags)
<div class="bbs" id="main-content">
<div class="metaline">
<span class="article-meta-tag">
author
</span>
<span class="article-meta-value">
Jorden
</span>
</div>
<div class="metaline">
<span class="article-meta-tag">
board
</span>
<span class="article-meta-value">
NBA
</span>
</div>
I am here
</div>
我只需要
我在这里
答案 0 :(得分:1)
字符串是div
类型的主NavigableString
的子元素,因此您可以循环遍历div.children并根据节点的类型进行过滤:
from bs4 import BeautifulSoup, NavigableString
[x.strip() for x in soup.find("div", {'id': 'main-content'}).children if isinstance(x, NavigableString) and x.strip()]
# [u'I am here']
数据:
soup = BeautifulSoup("""<div class="bbs" id="main-content">
<div class="metaline">
<span class="article-meta-tag">
author
</span>
<span class="article-meta-value">
Jorden
</span>
</div>
<div class="metaline">
<span class="article-meta-tag">
board
</span>
<span class="article-meta-value">
NBA
</span>
</div>
I am here
</div>""", "html.parser")
答案 1 :(得分:0)
soup = BeautifulSoup(that_html)
div_tag = soup.div
required_string = div_tag.string