我想从此网站http://www.iso-parts.com/Index/1获取其旁边的每个NSN及其说明。
我尝试的代码是:
import requests
from bs4 import BeautifulSoup
import urllib3
import pyrebase
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
NSNurl = 'http://www.iso-parts.com/Index/1'
uClient = requests.get(NSNurl, verify=False)
page_html = uClient.content
# close client
uClient.close()
page_soup = soup(page_html, "html.parser")
container = page_soup.find_all("td", {"class": "tdD"})
for container1 in container:
NSN = container1.find("td", {"class": "tdD"})
print(NSN)
而不是获取NSN列表,例如1005-00-130-5515
,我得到的只是None
。如何获取站点中的所有NSN?
答案 0 :(得分:0)
目前,您正在获取所有td
,然后逐个查找每个td
。
container = page_soup.find_all("td", {"class": "tdD"})
for container1 in container:
#container1 is a <td></td>
NSN = container1.find("td", {"class": "tdD"}) #here
print(NSN)
相反,您需要获取container1
此外,每行中实际上有两个<td>
类tdD
。数字的值和文本描述。所以你应该跳过没有嵌套在其中的锚的那些。
for container1 in container: #container1 is <td> element
nsn = container1.find('a') #get the anchor from the <td>
if nsn is not None: #if an anchor is found
print(nsn.contents[0]) #print it
else:
print(container1.text) #the description