网站爬行与python和美丽的汤不起作用

时间:2017-09-20 00:58:16

标签: python beautifulsoup

我想从此网站http://www.iso-parts.com/Index/1获取其旁边的每个NSN及其说明。

我尝试的代码是:

import requests
from bs4 import BeautifulSoup
import urllib3
import pyrebase


from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

NSNurl = 'http://www.iso-parts.com/Index/1'
uClient = requests.get(NSNurl, verify=False)
page_html = uClient.content

# close client
uClient.close()
page_soup = soup(page_html, "html.parser")
container = page_soup.find_all("td", {"class": "tdD"})

for container1 in container:
        NSN = container1.find("td", {"class": "tdD"})
        print(NSN)

而不是获取NSN列表,例如1005-00-130-5515,我得到的只是None。如何获取站点中的所有NSN?

1 个答案:

答案 0 :(得分:0)

目前,您正在获取所有td,然后逐个查找每个td

container = page_soup.find_all("td", {"class": "tdD"})

for container1 in container:
        #container1 is a <td></td>
        NSN = container1.find("td", {"class": "tdD"}) #here
        print(NSN)

相反,您需要获取container1

内锚点的文本值

此外,每行中实际上有两个<td>tdD。数字的值和文本描述。所以你应该跳过没有嵌套在其中的锚的那些。

for container1 in container:    #container1 is <td> element
     nsn = container1.find('a') #get the anchor from the <td>
     if nsn is not None:        #if an anchor is found
         print(nsn.contents[0]) #print it
     else:
         print(container1.text) #the description