美丽的汤嵌套div递归获取文本

时间:2017-11-24 07:21:10

标签: python beautifulsoup

我希望嵌套div中的数据无法获取。

有嵌套div我需要正确格式化数据。

我写了bs4模块,但是我收到了错误

BeautifulSoup:AttributeError:' NavigableString'对象没有属性' name'

请帮助我!

我的HTML

ConnectivityManager ConnectionManager = (ConnectivityManager) getSystemService(Context.CONNECTIVITY_SERVICE);
        NetworkInfo networkInfo = ConnectionManager.getActiveNetworkInfo();
        if (networkInfo != null && networkInfo.isConnected() == true) {

            //Internet Connected

        } else {

            //Internet Disconnected

        }

我美丽的汤代码

<div id="new">
    <div id="newDat">
        <div class="Data">
            <div class="DataNew">
                <div class="DataNew new">
                    <div class="Data Left">
                        <div class="name"><a class="name" href="">Jack Daniels</a></div>
                        <div class="details"><span class="loc">Barcelona</span></div>
                        <div class="header"><a class="looking"> Looking for meeting new people</a></div>
                        <div class="ideas"><a class="ideas">I have new ideas</a></div>
                        <div class="profile"> <em class="profilss"></em>MS in cs<br></div>

                    </div>
                    <div class="Data Right">
                        <a class="phone"><span class="txt">+123123123123123231</span></a>
                    </div>
                </div>

            </div>
        </div>
        <div class="DataOne">
            <div class="DataNew">
                <div class="DataNew new">
                    <div class="Data Left">
                        <div class="name"><a class="name" href="">Jack Daniels</a></div>
                        <div class="details"><span class="loc">Barcelona</span></div>
                        <div class="header"><a class="looking"> Looking for meeting new people</a></div>
                        <div class="ideas"><a class="ideas">I have new ideas</a></div>
                        <div class="profile"> <em class="profilss"></em>MS in cs<br></div>

                    </div>
                    <div class="Data Right">
                        <a class="phone"><span class="txt">+123123123123123231</span></a>
                    </div>
                </div>

            </div>
        </div>
        <div class="DataTwo">
            <div class="DataNew">
                <div class="DataNew new">
                    <div class="Data Left">
                        <div class="name"><a class="name" href="">Jack Daniels</a></div>
                        <div class="details"><span class="loc">Barcelona</span></div>
                        <div class="header"><a class="looking"> Looking for meeting new people</a></div>
                        <div class="ideas"><a class="ideas">I have new ideas</a></div>
                        <div class="profile"> <em class="profilss"></em>MS in cs<br></div>

                    </div>
                    <div class="Data Right">
                        <a class="phone"><span class="txt">+123123123123123231</span></a>
                    </div>
                </div>  
            </div>
        </div>
        <div class="DataThree">
            <div class="DataNew">
                <div class="DataNew new">
                    <div class="Data Left">
                        <div class="name"><a class="name" href="">Jack Daniels</a></div>
                        <div class="details"><span class="loc">Barcelona</span></div>
                        <div class="header"><a class="looking"> Looking for meeting new people</a></div>
                        <div class="ideas"><a class="ideas">I have new ideas</a></div>
                        <div class="profile"> <em class="profilss"></em>MS in cs<br></div>

                    </div>
                    <div class="Data Right">
                        <a class="phone"><span class="txt">+123123123123123231</span></a>
                    </div>
                </div>

            </div>
        </div>
    </div>
</div>

我想要像这样的输出

    li = page.find('div', {'id': 'new'})
    for tag in li:
        for i in tag.find_all("div", {"class": "name"}):
            print i.getText()
            break

        for i in tag.find_all("div", {"class": "details"}):
            print i.getText()
            break

        for i in tag.find_all("div", {"class": "header"}):
            print i.getText()
            break


        for i in tag.find_all("div", {"class": "ideas"}):
            print i.getText()
            break


        for i in tag.find_all("div", {"class": "profile"}):
            print i.getText()
            break

        for i in tag.find_all("div", {"class": "phone"}):
            print i.getText()
            break

等等。

如果Div one Name : Jack Daniels Details : Barcelona header : Looking for meeting new people ideas : I have new ideas profile: MS in cs tel : +123123123123123231 Div two Name : Jack Daniels Details : Barcelona header : Looking for meeting new people ideas : I have new ideas profile: MS in cs tel : +123123123123123231 内有100个Div,我需要这样的输出。

1 个答案:

答案 0 :(得分:1)

你可以这样做。这将返回每个div的数据。

from bs4 import BeautifulSoup
soup = BeautifulSoup(b) // b is html 
rows =soup.find_all('div', {'class': 'DataNew'})
for tag in rows:
    for tag in li:
    for i in tag.find_all("div", {"class": "name"}):
        print i.getText()
        break

    for i in tag.find_all("div", {"class": "details"}):
        print i.getText()
        break

    for i in tag.find_all("div", {"class": "header"}):
        print i.getText()
        break


    for i in tag.find_all("div", {"class": "ideas"}):
        print i.getText()
        break


    for i in tag.find_all("div", {"class": "profile"}):
        print i.getText()
        break

    for i in tag.find_all("div", {"class": "Data Right"}):
        print i.getText()
        break