Question

我的目标是解析第二页中的图像。我为此使用bf4和Python3。请查看这两页：

1）只有page带有全部4种颜色的图像（我可以解析此页面）；

2）和page仅包含一种颜色（在此示例中为chrom色）的图像。我需要解析此页面。

使用浏览器，我可以看到第二个页面与第一个页面不同。但是，使用bs4在第一页和第二页上得到了相似的结果，因为python在第二页地址中无法识别此“ .html＃/ kolor-chrom”。

首页地址：“ https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html”。

第二页地址：“ https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html#/kolor-chrom”。

要复制的代码：

from bs4 import BeautifulSoup
import requests

adres1 = "https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html"
adres2 = "https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html#/kolor-chrom"

def parse_one_page(adres):
    """Parse one page and get all the img src from adres"""
    # Use headers to prevent hide our script
    headers = {'User-Agent': 'Mozilla/5.0'}
    # Get page
    page = requests.get(adres, headers=headers)  # read_timeout=5
    # Get all of the html code
    soup = BeautifulSoup(page.content, 'html.parser')
    # Find div
    divclear = soup.find_all("div", class_="clearfix")
    divclear = divclear[9]
    # Find img tag
    imgtag = [i.find_all("img") for i in divclear][0]
    # Find src
    src = [i["src"] for i in imgtag]
    # See how much images are here
    print(len(src))
    # return list with img src
    return src


print(parse_one_page(adres1))
print(parse_one_page(adres2))

运行这些代码后，您将看到这两个地址的输出类似：两个地址的24张图像。在第一页中有24张图像（正确）。但是在第二页中，这里只能是2张图像，而不是24张（不正确）！

所以希望有人帮助我如何使用bs4正确解析python3中的第二页。

Answer 1

是的，看来无法使用bs4解析此类响应页面

无法在python3中使用bs4解析包含“ .html＃/某物”的地址

1 个答案: