没有结果为特定类中的文本调用find_all

时间:2017-06-11 05:24:49

标签: python html beautifulsoup

我试图获取特定类的所有文本,但它返回一个空列表:

>>> soup.find_all(' dataRow odd')
[]

HTML:

<tr class=" dataRow odd" onblur="if (window.hiOff){hiOff(this);}" 
onfocus="if (window.hiOn){hiOn(this);}" onmouseout="if (window.hiOff){hiOff(this);}" 
onmouseover="if (window.hiOn){hiOn(this);}"><td class='actionColumn'>&nbsp;</td><th scope="row" class=" dataCell  ">
<a href="/a0I9000000hHJIN?btdid=0019000001piFE9">textexttext</a></th><td class=" dataCell  ">Active</td><td class=" dataCell  ">
<a href="/a089000001nOvG8?btdid=0019000001piFE9">BIG TEXT/a></td>
<td class=" dataCell  ">TEXTTEXTTEXT</td><td class=" dataCell  ">TEXTTEXTTEXT</td>
<td class=" dataCell  "> </td><td class=" dataCell  ">&nbsp;</td><td class=" dataCell  DateElement">8/02/2019</td></tr>

我试图抓取该代码中的所有文字。 但是当我运行我的代码时,它返回[],好像它没有找到任何东西。

import requests, bs4, re
html = open('2.html')
soup = bs4.BeautifulSoup(exampleFile, "lxml")
duh = soup .find_all(' dataRow odd')
print (duh)

我哪里错了? 此外,理想情况下,代码会吐出不同行上的所有单独文本

1 个答案:

答案 0 :(得分:0)

查询dataRow odd会产生周围的<tr>,其中包含<td><a>等内的所有其他元素。您只需抓取文本通过像这样访问.text属性,它只会给你一大堆文本而不是HTML:

for d in duh:
    print d.text

您可以单独获取<td>中的所有<tr>元素,然后从每个元素中获取.text,而不是这样。

import requests, bs4, re

html = open('test.html')
soup = bs4.BeautifulSoup(html, "html.parser") # use html parser instead of XML
duh = soup.find_all('tr', {'class':' dataRow odd'}) # using ktb's suggestion from comments
for d in duh:
    tds = d.find_all()
    for td in tds:
        cleaned = td.text.strip().rstrip('\n') # remove newlines and spaces
        if cleaned != '':
            print cleaned