使用BeautifulSoup从网站下载特定图像

时间:2020-04-19 19:06:48

标签: python html image beautifulsoup

我正在尝试使用BeautifulSoup从网站下载图像。

这是与网站相关的HTML片段:

<div class="c-image _verticalMode">
    <div class="c-image__inner">
        <img src="https://images.example.com/qwe098.jpg/dims/optimize" class="c-image__image" width="100%">
    </div>
</div>

这是我到目前为止写了多少:

import requests
from bs4 import BeautifulSoup as bs

url=r'https://www.example.com/d?tNo=123&aNo=17'

soup=bs(requests.get(url).content,'html.parser')
pics=soup.find(class_='c-image')
print(pics)

打印输出:

<div class="c-image _verticalMode">
<!--
        <div class="c-image__inner">
          <img src="../../img/c/dummy.jpg" class="c-image__image" width="100%">
        </div>
        -->
<!--
        <a href="#" class="c-img__prev"><i class="i-arrow-left-black"></i></a>
        <a href="#" class="c-img__next"><i class="i-arrow-right-black"></i></a>
      -->
</div>

img src被截断了(由于BeautifulSoup?),但它看起来与第一个HTML代码段中给出的位置不同(尽管它位于相同的位置)。

我似乎无法设法获得链接。我尝试使用soup.find(class_='c-image__image'),但是返回了NoneType对象。

我该怎么做才能获得正确的图像链接,以便我可以下载它?

1 个答案:

答案 0 :(得分:1)

严格按照您的HTML代码段进行操作:

class ProjectsController < ApplicationController
  before_action :set_project, only: [:show, :edit, :update, :destroy]
  before_action :require_user
  before_action :verify_project_access, only: [:show]

  # GET /projects
  # GET /projects.json
  def index
   @projects = Project.where(user: current_user).or(Project.where(id: current_user)).order('updated_at DESC')
  end

  # GET /projects/1
  # GET /projects/1.json
  def show
    @task = Task.new
  end

  # GET /projects/new
  def new
    @project = Project.new
  end

  # GET /projects/1/edit
  def edit
  end

  # POST /projects
  # POST /projects.json
  def create
    @project = Project.new(project_params)
    @project.user = current_user
    respond_to do |format|
      if @project.save
        format.html { redirect_to @project, notice: "Project was successfully created." }
        format.json { render :show, status: :created, location: @project }
      else
        format.html { render :new }
        format.json { render json: @project.errors, status: :unprocessable_entity }
      end
    end
  end

  # PATCH/PUT /projects/1
  # PATCH/PUT /projects/1.json
  def update
    respond_to do |format|
      if @project.update(project_params)
        format.html { redirect_to @project, notice: "Project was successfully updated." }
        format.json { render :show, status: :ok, location: @project }
      else
        format.html { render :edit }
        format.json { render json: @project.errors, status: :unprocessable_entity }
      end
    end
  end

  # DELETE /projects/1
  # DELETE /projects/1.json
  def destroy
    @project.destroy
    respond_to do |format|
      format.html { redirect_to projects_url, notice: "Project was successfully destroyed." }
      format.json { head :no_content }
    end
  end

  private

  # Use callbacks to share common setup or constraints between actions.
  def set_project
    @project = Project.find(params[:id])
  end

  # Only allow a list of trusted parameters through.
  def project_params
    params.require(:project).permit(:title, :description, :notes)
  end

  def verify_project_access
    if @project.user != current_user
      flash[:danger] = "You don't have access this project"
      redirect_to projects_url
    end
  end
end

输出:

my_img = """[your html snippet]"""

from bs4 import BeautifulSoup as bs
soup = bs(my_img,'lxml')
pics=soup.select_one('div.c-image__inner img')
print(pics['src'])