beautifulsoup从未排序的列表中获取href

时间:2016-03-23 13:27:53

标签: bs4

为什么这段代码没有输出三个链接到搜索引擎找到的三个命中?

from bs4 import BeautifulSoup

from urllib import urlopen

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
url = "https://www.conrad.de/de/Search.html?search=mosfet+driver"
page = opener.open(url)
soup = BeautifulSoup(page, "html5lib")
#print soup.prettify


result = soup.find_all('ul', class_="ccpProductList")

products  = result[0].find_all('a', class_="ccpProductListItem__title")

for product in products:
    print product.href

输出为“无”三次。

1 个答案:

答案 0 :(得分:0)

我认为您的问题是最后一行print product.href。我认为你需要print product['href']。以下代码执行我认为您想要的内容:

from bs4 import BeautifulSoup
import urllib2

url = 'https://www.conrad.de/de/Search.html?search=mosfet+driver'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
page = response.read()
soup = BeautifulSoup(page,'html.parser')
#print soup.prettify

result = soup.find_all('ul', class_="ccpProductList")
products = result[0].find_all('a', class_="ccpProductListItem__title")
for p in products:
    print p['href']