如何修复“美丽汤”未返回的HTML信息?

时间:2019-07-14 13:13:38

标签: python-3.x web-scraping beautifulsoup

我无法从网站上检索所需的产品数据。我可以看到我认为需要抓取的HTML部分,但是我的代码未返回任何数据。它适用于同一页面上的某些HTML标记,但不适用于我想要的页面。

我是一个真正的初学者。我已经观看了youtube视频,并尝试在此处进行问题/回复。而且据我所知,似乎我需要从网站获取的数据可能不是html,而是嵌入了html(?)。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.harristeeter.com/specials/weekly-list/best-deals'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
len(page_soup.findAll("div",{"class":"product_infoBox"}))
len(page_soup.findAll("div",{"class":"container"}))

在代码中,我可以检索“容器”(= 5)而不是“ product_infoBox”(= 0)的结果。 “ product_infoBox”是我需要的部分。

1 个答案:

答案 0 :(得分:0)

页面通过JSON动态加载数据,但是您也可以通过requests获取此数据。该脚本搜索商店,选择第一个结果并加载每周特价:

import requests
from bs4 import BeautifulSoup
import json

store_search_url = 'https://www.harristeeter.com/api/v1/stores/search?Address={}&Radius=10000&AllStores=true&NewOrdering=false&OnlyPharmacy=false&OnlyFreshFood=false&FreshFoodOrdering=undefined'
weekly_specials_url = 'https://www.harristeeter.com/api/v1/stores/{}/departments/0/weekly_specials?'

headers = {'Referer': 'https://www.harristeeter.com/store-locator'}

with requests.session() as s:
    r = s.get('https://www.harristeeter.com/store-locator', headers=headers)
    store_search_data = s.get(store_search_url.format('pine ridge plaza, reynolda road'), headers=headers).json()

    # This prints all results from store search:
    # print(json.dumps(store_search_data, indent=4))

    # we select the first match:
    store_number = store_search_data['Data'][0]['Number']
    weekly_specials_data = s.get(weekly_specials_url.format(store_number), headers=headers).json()

    print(json.dumps(weekly_specials_data, indent=4))

打印:

{
    "Status": "success",
    "Data": [
        {
            "ID": "4615146",
            "AdWeek": "2019-07-16",
            "DepartmentNumber": "4",
            "AdWeekExpires": "07/16/2019",
            "ActiveAdWeekRelease": "2019-07-16",
            "StartDate": "7/10/2019",
            "EndDate": "7/16/2019",
            "IsCardRequired": true,
            "Title": "Harris Teeter Cottage Cheese, Sour Cream, French",
            "Description": "8-16 oz",
            "Detail": "e-VIC Member Price $1.27",
            "Price": "2/$3",
            "SpecialPrice": "$1.27",
            "DesktopImageUrl": "https://23360934715048b8b9a2-b55d76cb69f0e86ca2d9837472129d5a.ssl.cf1.rackcdn.com/sm_4615146.jpg",
            "MobileImageUrl": "https://23360934715048b8b9a2-b55d76cb69f0e86ca2d9837472129d5a.ssl.cf1.rackcdn.com/sm_4615146.jpg",
            "Limit": "6",
            "Savings": "Save at Least 38\u00a2 on 2",
            "Size": "8-16 oz",
            "Subtitle": "Limit 6 at e-VIC Price",
            "IsAdded": false,
            "RetinaImageUrl": "https://23360934715048b8b9a2-b55d76cb69f0e86ca2d9837472129d5a.ssl.cf1.rackcdn.com/4615146.jpg",
            "TIE": "1",
            "Organic": "0",
            "Type": "EVIC",
            "DepartmentName": "Dairy & Chilled Foods"
        },
        {
            "ID": "4614507",

... and so on.