我想从以下html中抓取8415219212510
<td headers="th1" style="width: 125px;" valign="top">
<a href="https://www.dibbs.bsm.dla.mil/RFQ/RFQNsn.aspx?value=8415219212510&category=issue&Scope=" title="go to NSN view">8415-21-921-2510</a>
</td>
我用过
main_page = 'https://www.dibbs.bsm.dla.mil/RFQ/RfqRecs.aspx?category=issue&TypeSrch=dt&Value=09-14-2017'
dibbssoup = BeautifulSoup(main_page.content, 'html5lib')
#grabs each rfq
containers1 = dibbssoup.find_all("tr", {"class": "BgWhite"})
NSN = container1.find("td", {"headers": "th1"}).a.get_text(strip=True)
和
NSN = container1.find("td", {"headers": "th1"}).a.text
我仍然收到此错误
AttributeError:'NoneType'对象没有属性'get_text'
AttributeError:'NoneType'对象没有属性'text'
如何解决此错误?
答案 0 :(得分:0)
首先,BeautifulSoup不处理任何HTTP交互,您需要提供下载的HTML文档。为此,您可以使用requests
之类的库。我无法访问您提供的链接,但我希望您能找到正确的想法。
import requests
from bs4 import BeautifulSoup
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'lxml')
NSN = soup.select_one('a[title*=NSN]').get_text()