我正试图从中获取每种产品的名称和价格 https://www.daraz.pk/catalog/?q=risk,但没有显示。
containers = page_soup.find_all("div",{"class":"c2p6A5"})
for container in containers:
pname = container.findAll("div", {"class": "c29Vt5"})
name = pname[0].text
price1 = container.findAll("span", {"class": "c29VZV"})
price = price1[0].text
print(name)
print(price)
答案 0 :(得分:3)
如果页面是动态的,则Selenium应对此进行照顾
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://www.daraz.pk/catalog/?q=risk')
r = browser.page_source
page_soup = bs4.BeautifulSoup(r,'html.parser')
containers = page_soup.find_all("div",{"class":"c2p6A5"})
for container in containers:
pname = container.findAll("div", {"class": "c29Vt5"})
name = pname[0].text
price1 = container.findAll("span", {"class": "c29VZV"})
price = price1[0].text
print(name)
print(price)
browser.close()
输出:
Risk Strategy Game
Rs. 5,900
Risk Classic Board Game
Rs. 945
RISK - The Game of Global Domination
Rs. 1,295
Risk Board Game
Rs. 1,950
Risk Board Game - Yellow
Rs. 3,184
Risk Board Game - Yellow
Rs. 1,814
Risk Board Game - Yellow
Rs. 2,086
Risk Board Game - The Game of Global Domination
Rs. 975
...
答案 1 :(得分:3)
页面中有JSON数据,您可以使用beautifulsoup在<script>
标记中获取它,但我认为这不是必需的,因为您可以直接通过json
和{{1}获取它}
re
答案 2 :(得分:1)
我错了。 json中提供了计算页数的信息,因此您可以获得所有结果。无需正则表达式,因为您可以提取相关的脚本标记。另外,您可以循环创建页面网址。
import requests
from bs4 import BeautifulSoup
import json
import math
def getNameAndPrice(url):
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
data = json.loads(soup.select('script')[2].text.strip('window.pageData='))
if url == startingPage:
resultCount = int(data['mainInfo']['totalResults'])
resultsPerPage = int(data['mainInfo']['pageSize'])
numPages = math.ceil(resultCount/resultsPerPage)
result = [[item['name'],item['price']] for item in data['mods']['listItems']]
return result
resultCount = 0
resultsPerPage = 0
numPages = 0
link = "https://www.daraz.pk/catalog/?page={}&q=risk"
startingPage = "https://www.daraz.pk/catalog/?page=1&q=risk"
results = []
results.append(getNameAndPrice(startingPage))
for links in [link.format(page) for page in range(2,numPages + 1)]:
results.append(getNameAndPrice(links))
答案 3 :(得分:1)
向像我这样的新手推荐JSON答案。 您可以使用Selenium这样导航到搜索结果页面:
PS:非常感谢@ewwink。你救了我的一天!
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time #time delay when load web
import requests, json, re
keyword = 'fan'
opt = webdriver.ChromeOptions()
opt.add_argument('headless')
driver = webdriver.Chrome(options = opt)
# driver = webdriver.Chrome()
url = 'https://www.lazada.co.th/'
driver.get(url)
search = driver.find_element_by_name('q')
search.send_keys(keyword)
search.send_keys(Keys.RETURN)
time.sleep(3) #wait for web load for 3 secs
page_html = driver.page_source #Selenium way of page_html = webopen.read() for BS
driver.close()
jsonStr = re.search(r'window.pageData=(.*?)</script>', page_html).group(1)
jsonObject = json.loads(jsonStr)
for item in jsonObject['mods']['listItems']:
print(item['name'])
print(item['sellerName'])