Question

<div class="product_image clearfix"> <img src="https://res.sastasundar.com/incom/images/product/thumb/XPLOR-Dark-Chocolate-Brownie-1542880911-10051353-1.jpg" title="XPLOR Dark Chocolate Brownie 50 gm" class=" center-block"> </div>

使用python和漂亮的汤

我找不到该div

links = soup.find_all('div', attrs={'class': 'product_image clearfix'})

那之后我必须提取图像

Answer 1

对于当前版本的BS，这应该可以工作：

links = soup.find_all('div', class_='product_image clearfix')

Answer 2

您正在使用哪个版本的BeautifulSoup。您应该可以使用以下命令打印div的内容：

from bs4 import BeautifulSoup

html = """<div class="product_image clearfix">
  <img src="https://res.sastasundar.com/incom/images/product/thumb/XPLOR-Dark-Chocolate-Brownie-1542880911-10051353-1.jpg" title="XPLOR Dark Chocolate Brownie 50 gm" class=" center-block">
</div>"""

soup = BeautifulSoup(html, 'html.parser')

for div in soup.find_all('div', class_='product_image clearfix'):
  for img in div.find_all('img', recursive=False):
    print(img)

Answer 3

对于我从documentation中收集到的信息，这是一种可行的方法：

您可以通过

获得所需的标签

tags = soup.find_all('div', "product_image clearfix")

其中第二个参数默认为HTML元素的类名称。然后，您可以通过使用.contents将子项放入列表中或使用.children遍历它们来查看子项标签。在此示例中，为简单起见，我将使用子级，并使用第一个找到并匹配的标记从以下项中提取图像源：

import bs4

soup = bs4.BeautifulSoup("<div class=\"product_image clearfix\"> <img src=\"https://res.sastasundar.com/incom/images/product/thumb/XPLOR-Dark-Chocolate-Brownie-1542880911-10051353-1.jpg\" title=\"XPLOR Dark Chocolate Brownie 50 gm\" class=\" center-block\"></div>")

tags = soup.find_all('div', "product_image clearfix")

img_src = None

for t in tags[0].children:
    if type(t) == bs4.element.Tag:
        img_src = t['src']

print(img_src)

类型检查非常重要，因为bs4.element.NavigableString中可以有tags[0].chlidren个对象，如果有换行符或空格，则取决于HTML解析器。

Answer 4

全套动态加载。您可以向页面发出相同的请求

(Pdb) break your_script.py:11
Breakpoint 1 at /path/to/your_script.py:11
(Pdb) continue
> /path/to/your_script.py(11)<module>()
-> for pop_dict in pop_data:
(Pdb)

如何使用Beautiful Soup提取div的内容（图像）

4 个答案: