如何抓取不在标签中的Web数据(类名相同)

时间:2017-06-04 22:04:10

标签: python beautifulsoup

对不起。 我问过这样的问题。 之后我仍然有关于不在标签中的数据的问题。 我问的几个不同的问题 (How can i crawl web data that not in tags

<div class="bbs" id="main-content">
    <div class="metaline">
        <span class="article-meta-tag">
             author
        </span>
        <span class="article-meta-value">
             Jorden 
        </span>
    </div>
    <div class="metaline">
        <span class="article-meta-tag">
            board
        </span>
        <span class="article-meta-value">
            NBA
        </span>
    </div>

I am here

</div>

我只需要

  

我在这里

2 个答案:

答案 0 :(得分:1)

字符串是div类型的主NavigableString的子元素,因此您可以循环遍历div.children并根据节点的类型进行过滤:

from bs4 import BeautifulSoup, NavigableString
[x.strip() for x in soup.find("div", {'id': 'main-content'}).children if isinstance(x, NavigableString) and x.strip()]
# [u'I am here']

数据

soup = BeautifulSoup("""<div class="bbs" id="main-content">
    <div class="metaline">
        <span class="article-meta-tag">
             author
        </span>
        <span class="article-meta-value">
             Jorden 
        </span>
    </div>
    <div class="metaline">
        <span class="article-meta-tag">
            board
        </span>
        <span class="article-meta-value">
            NBA
        </span>
    </div>
I am here
</div>""", "html.parser")

答案 1 :(得分:0)

soup = BeautifulSoup(that_html)
div_tag = soup.div
required_string = div_tag.string

去思考this documentation