BeautifulSoup在html页面中未显示某些标签

时间:2019-03-30 00:03:05

标签: python python-3.x web-scraping beautifulsoup

如果我访问此页面here,经过检查,我可以在页面上看到带有img标签的图像。

但是,当我尝试使用requests获取页面并使用BeautifulSoup进行解析时,我无法访问同一张图片。我在这里想念什么?

代码工作正常,我从请求中获得200作为status_code。

import requests
from bs4 import BeautifulSoup

url = 'https://mangadex.org/chapter/435396/2'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}

page = requests.get(url,headers=headers)
print(page.status_code)

soup = BeautifulSoup(page.text,'html.parser')
img_tags = soup.find_all('img')
for img in img_tags:
    print(img)

编辑::

根据建议,硒选项可以正常工作。但是有没有办法像BeautifulSoup一样加快速度?

2 个答案:

答案 0 :(得分:1)

页面上的JavaScript需要运行才能填充页面上的某些元素。您可以在访问图像之前使用Selenium运行页面的JavaScript。

答案 1 :(得分:0)

您可以使用API​​获取图像。下面的代码从页面获取所有图像并打印URL:

import requests

headers = {
    'Accept': 'application/json, text/plain, */*',
    'Referer': 'https://mangadex.org/chapter/435396/2',
    'DNT': '1',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/73.0.3683.86 Safari/537.36',
}

params = (
    ('id', '435396'),
    ('type', 'chapter'),
    ('baseURL', '/api'),
)

response = requests.get('https://mangadex.org/api/', headers=headers, params=params)
data = response.json()

img_base_url = "https://s4.mangadex.org/data"
img_hash = data["hash"]
img_names = data["page_array"]

for img in img_names:
    print(f"{img_base_url}/{img_hash}/{img}")

输出:

  

https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x1.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x2.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x3.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x4.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x5.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x6.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x7.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x8.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x9.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x10.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x11.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x12.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x13.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x14.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x15.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x16.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x17.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x18.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x19.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x20.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x21.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x22.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x23.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x24.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x25.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x26.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x27.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x28.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x29.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x30.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x31.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x32.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x33.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x34.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x35.png
  https://s4.mangadex.org/data/ac081a99e13d8765d48e55869cd5444c/x36.png