当我使用find_all()
时,我应该得到100个结果,但我只得到25个。
CODE
以下是我抓取tweakers并尝试返回类等于largethumb的每个元素的代码。
一旦我这样做,我就过滤出名称和价格。
my_url = 'https://tweakers.net/categorie/545/geheugen-intern/producten/'
uReq(my_url)
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
#should be 100 tr object's
products = page_soup.find_all("tr", attrs={"class": "largethumb"})
for product in products:
title = product.p.text
price_container = product.find_all("p", {"class": "price"})
price = price_container[0].text
lijst = title, price
print(lijst)
RESULT
结果是这个的25倍。
('Corsair Vengeance LPX CMK16GX4M2B3000C15', '€ 174,90')
答案 0 :(得分:1)
默认情况下,相关网站会显示25个搜索结果。如果您的网络浏览器不同,那是因为您的浏览器有来自相关网站的Cookie。如果您想获得100个结果,请按以下方式修改my_url
:
my_url = 'https://tweakers.net/categorie/545/geheugen-intern/producten/?pageSize=100&page=1'
uReq(my_url)
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
# WILL be 100 tr object's
products = page_soup.find_all("tr", attrs={"class": "largethumb"})
for product in products:
title = product.p.text
price_container = product.find_all("p", {"class": "price"})
price = price_container[0].text
lijst = title, price
print(lijst)
证明它有效:
>>> from requests import get
>>> from bs4 import BeautifulSoup
>>> my_url = 'https://tweakers.net/categorie/545/geheugen-
intern/producten/?pageSize=100&page=1'
>>> r = get(my_url)
>>> soup = BeautifulSoup(r.content, 'html5lib')
>>> len(soup.find_all('tr', attrs={"class": "largethumb"}))
100
如果将鼠标悬停在左下角的100个结果按钮上,则会看到此网址是他们重定向到的内容。快乐刮!
答案 1 :(得分:1)
试一试。它将获取所有25个结果:
from bs4 import BeautifulSoup
import requests
res =requests.get('https://tweakers.net/categorie/545/geheugen-intern/producten/')
soup = BeautifulSoup(res.text, "lxml")
for product in soup.find_all(class_="largethumb"):
title = product.find_all(class_="editionName")[0]['title']
price = product.find_all(class_="price")[0].text
print(title,price)
顺便说一句,您提供的链接在每页显示25个结果。