BeatuifulSoup返回重复数据,而不是遍历

时间:2018-12-20 07:12:44

标签: python-3.x beautifulsoup

我正在尝试从以下网站中的表中提取数据;但是,表中存储的数据很难确定,并尝试根据id进行检索。 link to reference

 def extractDataFromRow(_url):
    try:
        for table_container in _url.find_all('table', {'cellspacing': '1'}):
           # get data from topic title in table cell
           topic_title = table_container.a.text.replace("\n", "")
            if topic_title is not None:
               # get data from topic description in table cell
               for row_container in table_container.find_all('div', {'class': 'desc'}):
                   topic_description = row_container.text
                   # check details;
                   if topic_title and topic_description:
                      d = {'Title': topic_title,'Description': topic_description}
                      l.append(d)
    return l
  except:
       d = None

所以我改成

{'Description': 'hi there', 'Title' : 'Greetings'},{'Description': 'it's nice to meet you' , 'Title' : 'Greetings'}

因此,{Description}更改时,Title被重复重复;所以我只是想知道这是我的缩进问题还是重复地调用该行?下面是我要提取的表格〜

<table cellspacing="1">
    <tr> 
    #cluttered <Th> tags
    <!-- Forum page unique top -->
    <!--IBF.ANNOUNCEMENTS--><tr>
   <td class="darkrow1" colspan="8"><b>Forum Topics</b></td>
</tr><!-- Begin Topic Entry 4709448 -->
<tr> 
    <td align></td>
    <td align>
        <div>
            <div style="float:left">
            <a href="/topic/4709448" title="This topic was started: Dec 17 2018, 12:53 PM">
               Greetings</a> 
            </div>             
            <div style="float:right;"> <a href=> </a> </div><br/>
            <div class="desc" style="float:left; clear:left;">It&#39;s Hi there</div>
        </div>
    </td>
    <td align='center' class="row2">
     <a href="JS.script">4</a>    </td><td align="center" class="row2"><a href='link'>Shavon Lim</a></td>
    <td align="center" class="row2">
        </script-->
        152
        </td>
    <td class="row2">
    </td></tr>

1 个答案:

答案 0 :(得分:1)

您可以从不同的元素获取数据。我改变了你的根元素。我不得不稍微修改一下代码。但是你明白了。

from bs4 import BeautifulSoup
import requests

request = requests.get('https://forum.lowyat.net/ReviewsandGuides')
soup = BeautifulSoup(request.text, 'lxml')

for container in soup.find_all('td', {'class': 'row1','valign':'middle'}):
    # get data from topic title in table cell
    topic = container.select_one('a[href^="/topic/]"').text
    description = container.select_one('div.desc').text

    if topic and description:
        d = {'Title': topic,'Description': description}
        print(d)

这是结果。

{'Title': '\nThinkware F800 Pro', 'Description': 'The new King of the Beast dashcam!'}
{'Title': '\nMSI GE75 Review', 'Description': 'The Thin Bezel 17 Inch Laptop'}
{'Title': '\nMSI GS65 Stealth review', 'Description': "It's compact and pack with power"}
{'Title': '\nTineco A11 Master', 'Description': 'Better than Dyson V8'}
{'Title': '\nMSI P65 review', 'Description': 'Slim and powerful creator PC'}
{'Title': '\n[REVIEW] Asus RT-N14UHP', 'Description': 'High Power Router/AP/Repeater'}
{'Title': '\nTronsmart Trim 10000mAh USB-C Power Bank', 'Description': 'Ultra Slim goodness'}
{'Title': '\nASUS RT-AC86U', 'Description': 'Feature-packed advanced wireless router'}
{'Title': '\nXiaomi 70Mai Pro', 'Description': 'A new discreet dashcam'}
{'Title': '\nEdifier S360DB Hi-Res 2.1 Speaker System Review', 'Description': 'Successor of Edifier S350DB 2.1 Speaker'}
{'Title': '\nArmaggeddon Nuke 11 Ultimate 7.1 RGB Gaming Headse', 'Description': 'Affordable gaming headset with RGB light'}
{'Title': '\nArmaggeddon Nuke 7 Ultimate 7.1 RGB Gaming Headset', 'Description': 'Affordable gaming headset with RGB light'}
{'Title': '\nDDPai Mini3', 'Description': 'Dashcam with built-in eMMC 5.1'}
{'Title': '\nXiaomi Mijia 1S', 'Description': 'Budget STARVIS CMOS Car Recorder'}
{'Title': '\nX96 Max', 'Description': 'Budget Amlogic S905X2 TV Box'}
{'Title': '\nXiaomi Mi Drone 4K', 'Description': 'Value for money DJI killer?'}
{'Title': '\nAsus RT-AX88U', 'Description': 'Ultrafast WiFi 6 is here!!!'}
{'Title': '\nLenovo MIIX 320', 'Description': 'need your advise'}
{'Title': '\nViofo A129', 'Description': 'The highly anticipated dual dashcam'}
{'Title': '\nSonicGear KBX900 RGB Boombox Speaker Review', 'Description': 'TWS (True Wireless Stereo) speaker!'}
{'Title': '\nEdifier e235 Luna Eclipse 2.1 Speaker Review', 'Description': 'Stylish 2.1 speaker for home/office'}
{'Title': '\nAlfawise G70', 'Description': 'Super Capacitor Dashcam'}
{'Title': '\nBlackVue DR900S-2CH', 'Description': 'Best 4K Ultra HD dashcam?'}
{'Title': '\nXiaomi Roborock S50 Vacuum Robot vs Human', 'Description': 'The Definitive Test'}