Question

我是初学者，我正试图抓住a hrefs分别嵌入一堆div class的{{1}}。当我检查元素时，它看起来像这样：

<div class="item hentry" itemscope="" itemtype="http://schema.org/BlogPosting" data-id="1252224732659290211">
<img class="thumbnail" src="//img.youtube.com/vi/fX_kx_drRsY/0.jpg" style="width: 30px; height: 30px;">
  <h3 class="title entry-title" itemprop="name">
    <a href="the link i want to extract"</a>
    </h3>
</div>

我一直在Stackoverflow上搜索，但大多数示例都是div class修复的地方，我的页面有不修复的div类，数据ID不同。

我尝试使用以下内容，但我认为它只适用于修复div类的时候？

with open("list_of_urls.txt", "wb") as f:
    for item in soup.find_all("div", attrs={"class" : "item hentry"}):
        for link in item.find_all('a'):
            f.write("%s\n" % link["href"])

Answer 1

soup.select('div[class] a')  # find all a tags under the div tag which has class attribute

使用CSS selector

使用beautifulsoup在一堆独特的div类中提取hrefs

1 个答案: