从标签获取href链接

时间:2020-09-10 23:20:04

标签: python python-3.x web-scraping beautifulsoup

这只是HTML的一部分,并且页面上有多个具有相同HTML结构的产品

我想要页面上所有产品的所有href

<div class="row product-layout-category product-layout-list">
    <div class="product-col wow fadeIn animated" style="visibility: visible;">
        <a href="the link I want" class="product-item">
            <div class="product-item-image">
                <img data-src="link to an image" alt="name of the product" title="name of the product" class="img-responsive lazy" src="link to an image">
            </div>
            <div class="product-item-desc">
                <p><span><strong>brand</strong></span></p>                                            
                <p><span class="font-size-16">name of the product</span></p>
                <p class="product-item-price>
                    <span>product price</span></p>
            </div>
        </a>
    </div>
.
.
.

使用我编写的这段代码,我只得到了无数次打印

from bs4 import BeautifulSoup
import requests

url = 'link to the site'
response = requests.get(url)

page = response.content

soup = BeautifulSoup(page, 'html.parser')


##this includes the part that I gave you
items = soup.find('div', {'class': 'product-layout-category'})

allItems = items.find_all('a')

for n in allItems:
    print(n.href)

如何获取其中打印所有href的内容?

1 个答案:

答案 0 :(得分:0)

查看您的HTML代码,可以使用CSS选择器a.product-item。这将选择所有带有<a>的{​​{1}}标签:

class="product-item"

打印:

from bs4 import BeautifulSoup


html_text = """
<div class="row product-layout-category product-layout-list">
    <div class="product-col wow fadeIn animated" style="visibility: visible;">
        <a href="the link I want" class="product-item">
            <div class="product-item-image">
                <img data-src="link to an image" alt="name of the product" title="name of the product" class="img-responsive lazy" src="link to an image">
            </div>
            <div class="product-item-desc">
                <p><span><strong>brand</strong></span></p>
                <p><span class="font-size-16">name of the product</span></p>
                <p class="product-item-price>
                    <span>product price</span></p>
            </div>
        </a>
    </div>
"""

soup = BeautifulSoup(html_text, "html.parser")

for link in soup.select("a.product-item"):
    print(link.get("href")) # or link["href"]