我无法从网站上检索所需的产品数据。我可以看到我认为需要抓取的HTML部分,但是我的代码未返回任何数据。它适用于同一页面上的某些HTML标记,但不适用于我想要的页面。
我是一个真正的初学者。我已经观看了youtube视频,并尝试在此处进行问题/回复。而且据我所知,似乎我需要从网站获取的数据可能不是html,而是嵌入了html(?)。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.harristeeter.com/specials/weekly-list/best-deals'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
len(page_soup.findAll("div",{"class":"product_infoBox"}))
len(page_soup.findAll("div",{"class":"container"}))
在代码中,我可以检索“容器”(= 5)而不是“ product_infoBox”(= 0)的结果。 “ product_infoBox”是我需要的部分。
答案 0 :(得分:0)
页面通过JSON动态加载数据,但是您也可以通过requests
获取此数据。该脚本搜索商店,选择第一个结果并加载每周特价:
import requests
from bs4 import BeautifulSoup
import json
store_search_url = 'https://www.harristeeter.com/api/v1/stores/search?Address={}&Radius=10000&AllStores=true&NewOrdering=false&OnlyPharmacy=false&OnlyFreshFood=false&FreshFoodOrdering=undefined'
weekly_specials_url = 'https://www.harristeeter.com/api/v1/stores/{}/departments/0/weekly_specials?'
headers = {'Referer': 'https://www.harristeeter.com/store-locator'}
with requests.session() as s:
r = s.get('https://www.harristeeter.com/store-locator', headers=headers)
store_search_data = s.get(store_search_url.format('pine ridge plaza, reynolda road'), headers=headers).json()
# This prints all results from store search:
# print(json.dumps(store_search_data, indent=4))
# we select the first match:
store_number = store_search_data['Data'][0]['Number']
weekly_specials_data = s.get(weekly_specials_url.format(store_number), headers=headers).json()
print(json.dumps(weekly_specials_data, indent=4))
打印:
{
"Status": "success",
"Data": [
{
"ID": "4615146",
"AdWeek": "2019-07-16",
"DepartmentNumber": "4",
"AdWeekExpires": "07/16/2019",
"ActiveAdWeekRelease": "2019-07-16",
"StartDate": "7/10/2019",
"EndDate": "7/16/2019",
"IsCardRequired": true,
"Title": "Harris Teeter Cottage Cheese, Sour Cream, French",
"Description": "8-16 oz",
"Detail": "e-VIC Member Price $1.27",
"Price": "2/$3",
"SpecialPrice": "$1.27",
"DesktopImageUrl": "https://23360934715048b8b9a2-b55d76cb69f0e86ca2d9837472129d5a.ssl.cf1.rackcdn.com/sm_4615146.jpg",
"MobileImageUrl": "https://23360934715048b8b9a2-b55d76cb69f0e86ca2d9837472129d5a.ssl.cf1.rackcdn.com/sm_4615146.jpg",
"Limit": "6",
"Savings": "Save at Least 38\u00a2 on 2",
"Size": "8-16 oz",
"Subtitle": "Limit 6 at e-VIC Price",
"IsAdded": false,
"RetinaImageUrl": "https://23360934715048b8b9a2-b55d76cb69f0e86ca2d9837472129d5a.ssl.cf1.rackcdn.com/4615146.jpg",
"TIE": "1",
"Organic": "0",
"Type": "EVIC",
"DepartmentName": "Dairy & Chilled Foods"
},
{
"ID": "4614507",
... and so on.