如何抓取地区特定的产品价格

时间:2018-10-30 23:25:57

标签: python html web-scraping beautifulsoup

作为一项练习,我正在尝试从Lowes抓取有关洗衣机的信息。 https://www.lowes.com/pl/Washing-machines-Washers-dryers-Appliances/4294857977

要获取价格,我需要找到一个带有“产品定价”类的 div ,然后在其中获得一个 span 的文本。但是,当我在浏览器中检查 div 时,与使用beautifulsoup进行刮取时完全不同。当我检查时看起来像这样:

<div class="product-pricing">
<div class="pl-price js-pl-price" tabindex="-1">                 

     <!-- Was Price -->
     <div class="v-spacing-mini">
           <span class="h5 js-price met-product-price art-pl-contractPricing0" data-met-type="was">$499.00</span>
     </div>
     <div class="v-spacing-mini">
           <p class="darkMidGrey art-pl-wasPriceLbl0">was: $749.00</p>

              <small class="green small art-pl-saveThruLbl0">SAVE 33% thru 10/30/2018</small><br>
     </div>

  <!-- Start of Product Family Pricing -->

  <!-- Contractor Pack Messaging -->

  <!-- End of Product Family Pricing -->
  </div>
  <div class="v-spacing-small">
     <a role="link" tabindex="-1" data-toggle="popover" aria-haspopup="true" data-trigger="focus" data-placement="bottom auto" data-content="FREE local delivery applies to any major appliance $396 or more, full-size gas grills $498 or more, patio furniture orders $498 or more, and riding and ZTR mowers $999 or more. Applies to standard deliveries in US only. Purchase threshold calculated before taxes, after applicable discounts, if any. Additional fees may apply." data-original-title="Free Delivery" class="js-truck-delivery"><i class="icon-truck" title="FREE Delivery" aria-label="FREE Delivery."></i> <strong>FREE Delivery</strong></a>
  </div>
</div>

但是当我刮擦时,我会看到:

<div class="product-pricing">
<div class="v-spacing-jumbo clearfix">
  <a aria-haspopup="true" class="js-enter-location" data-content="Since Lowes.com is national in scope, we check inventory at your local store first in an effort to fulfill your order more quickly. You may find product or pricing that differ from that of your local store, but we make every effort to minimize those differences so you can get exactly what you want at the best possible price." data-placement="top auto" data-toggle="popover" data-trigger="focus" role="link" tabindex="-1">
     <p class="h6" id="ada-enter-location"><span>Enter your location</span>
        <i aria-hidden="true" class="icon-info royalBlue"></i>
     </p>
  </a>
  <p class="small-type secondary-text" tabindex="-1">for pricing and availability.</p>
</div>
<form action="#" class="met-zip-container js-store-locator-form" data-modal-open="true" data-zip-in="true" id="store-locator-form">
  <input name="redirectUrl" type="hidden" value="/pl/Washing-machines-Washers-dryers-Appliances/4294857977"/>
  <div class="form-group product-form-group">
     <div class="input-group">
        <input aria-label="Enter your zip code" autocompletetype="find-a-store-search" class="form-control js-list-zip-entry-input met-zip-code" name="searchTerm" placeholder="ZIP Code" role="textbox" tabindex="-1" type="text"/>
        <span class="input-group-btn">
        <button class="btn btn-primary js-list-zip-entry-submit met-zip-submit" data-linkid="get-pricing-and-availability-zip-in-modal-submit" tabindex="-1" type="submit">OK</button>
        </span>
     </div>
     <span class="inline-help">ZIP Code</span>
  </div>
 </form>
</div>

我认为这与网站必须使用我的位置来确定正确的价格有关。似乎有一个隐藏的输入,我的浏览器可以知道我的位置并告诉网站,有没有办法让漂亮的汤酱刮擦它检查我的位置后的价格?

这是我正在使用的代码:

import re
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.lowes.com/pl/Washing-machines-Washers-dryers- 
Appliances/4294857977'

uClient = uReq(my_url)

page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, features = "lxml")

containers = page_soup.findAll("div", {"class":"product-wrapper-right"})
for container in containers:
    price = container.findAll("span", {"class":"js-price"})[0].text

编辑:特定的代码给了我第二个html

container.findAll("div", {"class":"product-pricing"})   

1 个答案:

答案 0 :(得分:1)

不能100%地确定这会解决您的问题,但是使用硒可能会有所帮助,因为它是实际的浏览器,并且会发送普通浏览器在访问网站时发送的数据。

Selenium简介的链接:https://medium.freecodecamp.org/better-web-scraping-in-python-with-selenium-beautiful-soup-and-pandas-d6390592e251