如何提取div和类

时间:2018-06-30 16:51:29

标签: python web-scraping beautifulsoup

我是Python的新手,我需要获取第一个爬虫的标题,isbn,价格和发布日期。

 <div class="col-md-7 col-sm-7">
                <h4><a href="https://www.fadavis.com/product/anatomy-physiology-pocket-AP-jones-3">Pocket Anatomy and Physiology, 3rd Edition</a></h4>
                <div>Shirley A. Jones</div>
                <div>ISBN-13: 978-0-8036-5658-1</div>
                <p class="price"> $39.95 (US)</p>

                <div class="prd_lst">
                  <ul class="book_list">

                  </ul>
                 <div class="mobile_add_tocart">
                      <button type="button" class="addtocart" onclick="window.location.href='https://shoppingcart.fadavis.com/ShoppingCart/AddToCart?guid=74779e63-ccfb-454e-a6b9-b4e9f9a50793&amp;productid=10959&amp;applicationid=5'"> <span class="cart_icon sprite pull-left"></span>Add to Cart</button>

                  </div>



                   <div class="popover bottom Available_tooltip"><div class="arrow"></div>
                        <div class="popover-content">
                                <ul class="book_list">

                                </ul>
                                <div class="clearfix"></div>
                        </div>
                   </div>                

                  <div class="clearfix"></div>
                </div>
                <p>Publication Date: 10/12/2016</p>
                <div class="available active">
                  <div class="available_icon sprite pull-left"></div>
                  Available</div>
              </div>

1 个答案:

答案 0 :(得分:0)

import bs4

html = """
<div class="col-md-7 col-sm-7">
    <h4><a href="https://www.fadavis.com/product/anatomy-physiology-pocket-AP-jones-3">Pocket Anatomy and Physiology, 3rd Edition</a></h4>

    <div>Shirley A. Jones</div>
    <div>ISBN-13: 978-0-8036-5658-1</div>
    <p class="price"> $39.95 (US)</p>

    <div class="prd_lst">
        <ul class="book_list">

        </ul>
        <div class="mobile_add_tocart">
            <button type="button" class="addtocart" onclick="window.location.href='https://shoppingcart.fadavis.com/ShoppingCart/AddToCart?guid=74779e63-ccfb-454e-a6b9-b4e9f9a50793&amp;productid=10959&amp;applicationid=5'"> <span class="cart_icon sprite pull-left"></span>Add to Cart</button>

        </div>



        <div class="popover bottom Available_tooltip"><div class="arrow"></div>
            <div class="popover-content">
                    <ul class="book_list">

                    </ul>
                    <div class="clearfix"></div>
            </div>
        </div>

        <div class="clearfix"></div>
    </div>
    <p>Publication Date: 10/12/2016</p>
    <div class="available active">
        <div class="available_icon sprite pull-left"></div>
        Available</div>
    </div>
</div>
"""

soup=bs4.BeautifulSoup(html,'lxml')
div = soup.find('div', {'class': 'col-md-7'})
divs = div.findAll('div')
price = div.find('p', {'class': 'price'})
date = div.findAll('p')

print(divs[0].text)
print(divs[1].text)
print(price.text)
print(date[-1].text)

输出

Shirley A. Jones
ISBN-13: 978-0-8036-5658-1
$39.95 (US)
Publication Date: 10/12/2016