Python抓取<a> value from a table is not working

时间:2017-09-07 21:35:40

标签: python beautifulsoup

I have this html

<tr class="BgWhite">
  <td headers="th0" valign="top">
    3
  </td>
  <td headers="th1" style="width: 125px;" valign="top">
    <a href="https://www.dibbs.bsm.dla.mil/RFQ/RFQNsn.aspx?value=8340015511310&amp;category=issue&amp;Scope=" title="go to NSN view">8340-01-551-1310</a>
  </td>

I want to find this number id "8340-01-551-1310" so I used this code

 test = container1.find_all("td", {"headers": "th1"})
 test1 = test.find_all("a", {"title":"go to NSN view"})

but it displays this message

"ResultSet object has no attribute '%s'. You're probably treating a  list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

what am I doing wrongly and how do I fix this?

1 个答案:

答案 0 :(得分:1)

这是一种方式:

from bs4 import BeautifulSoup

data = """<tr class="BgWhite">
  <td headers="th0" valign="top">
    3
  </td>
  <td headers="th1" style="width: 125px;" valign="top">
    <a href="https://www.dibbs.bsm.dla.mil/RFQ/RFQNsn.aspx?value=8340015511310&amp;category=issue&amp;Scope=" title="go to NSN view">8340-01-551-1310</a>
  </td>"""

soup = BeautifulSoup(data, "lxml")

for td in soup.find_all('td', {"headers": "th1"}):
    for a in td.find_all('a'):
        print(a.text)

输出:

8340-01-551-1310

但是,如果您确定只有一个“th1”或只想要第一个。如果你确定只有一个“a”或者你只想要第一个。你可以尝试:

print(soup.find('td', {"headers": "th1"}).find('a').text)

返回相同的输出。

编辑: 刚刚注意到它可以简化为:

print(soup.find('td', {"headers": "th1"}).a.text)