Python bs4 BeautifulSoup:findall给出空括号

时间:2017-09-04 21:12:31

标签: python beautifulsoup bs4 findall

当我运行此代码时,它给了我一个空括号。我是网络抓取新手,所以我不知道我做错了什么。

import requests

from bs4 import BeautifulSoup

url = 'https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=laptop'

r = requests.get(url)

soup = BeautifulSoup(r.text, 'html.parser')

container = soup.findAll('li', {'class': 's-result-item celwidget '})
#btw the space is also there in the html code

print(container)

结果:

[]

我尝试的是从网站上获取html代码,并通过存储所有信息的li标签进行抄送,这样我就可以在for循环中打印出所有信息。

如果有人想解释如何使用BeautifulSoup,我们总能说话。 谢谢你们。

1 个答案:

答案 0 :(得分:0)

因此,抓住产品和价格的工作代码可能看起来像这样。

import requests
from bs4 import BeautifulSoup

url = 'https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=laptop'
r = requests.get(url, headers={'User-Agent': 'Mozilla Firefox'})
soup = BeautifulSoup(r.text, 'html.parser')
container = soup.findAll('li', {'class': 's-result-item celwidget '})
for cont in container:
    h2 = cont.h2.text.strip()

    # Amazon lists prices in two ways. If one fails, use the other
    try:
        currency = cont.find('sup', {'class': 'sx-price-currency'}).text.strip()
        price = currency + cont.find('span', {'class': 'sx-price-whole'}).text.strip()
    except:
        price = cont.find('span', {'class': 'a-size-base a-color-base'})
    print('Product: {}, Price: {}'.format(h2, price))

如果这有助于您进一步帮助我......