Python /美丽的汤find_all()找不到所有

时间:2017-11-06 12:56:40

标签: python python-3.x web-scraping beautifulsoup

当我使用find_all()时,我应该得到100个结果,但我只得到25个。

CODE

以下是我抓取tweakers并尝试返回类等于largethumb的每个元素的代码。

一旦我这样做,我就过滤出名称和价格。

my_url = 'https://tweakers.net/categorie/545/geheugen-intern/producten/'
uReq(my_url)
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

#should be 100 tr object's
products = page_soup.find_all("tr", attrs={"class": "largethumb"})
for product in products:
    title = product.p.text
    price_container = product.find_all("p", {"class": "price"})
    price = price_container[0].text
    lijst = title, price
    print(lijst)

RESULT

结果是这个的25倍。

('Corsair Vengeance LPX CMK16GX4M2B3000C15', '€ 174,90')

2 个答案:

答案 0 :(得分:1)

默认情况下,相关网站会显示25个搜索结果。如果您的网络浏览器不同,那是因为您的浏览器有来自相关网站的Cookie。如果您想获得100个结果,请按以下方式修改my_url

my_url = 'https://tweakers.net/categorie/545/geheugen-intern/producten/?pageSize=100&page=1'
uReq(my_url)
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

# WILL be 100 tr object's
products = page_soup.find_all("tr", attrs={"class": "largethumb"})
for product in products:
    title = product.p.text
    price_container = product.find_all("p", {"class": "price"})
    price = price_container[0].text
    lijst = title, price
    print(lijst)

证明它有效:

>>> from requests import get
>>> from bs4 import BeautifulSoup
>>> my_url = 'https://tweakers.net/categorie/545/geheugen-
intern/producten/?pageSize=100&page=1'
>>> r = get(my_url)
>>> soup = BeautifulSoup(r.content, 'html5lib')
>>> len(soup.find_all('tr', attrs={"class": "largethumb"}))
100

如果将鼠标悬停在左下角的100个结果按钮上,则会看到此网址是他们重定向到的内容。快乐刮!

答案 1 :(得分:1)

试一试。它将获取所有25个结果:

from bs4 import BeautifulSoup
import requests

res =requests.get('https://tweakers.net/categorie/545/geheugen-intern/producten/')
soup = BeautifulSoup(res.text, "lxml")
for product in soup.find_all(class_="largethumb"):
    title = product.find_all(class_="editionName")[0]['title']
    price = product.find_all(class_="price")[0].text
    print(title,price)

顺便说一句,您提供的链接在每页显示25个结果。