我正在尝试使用BeautifulSoup从网站下载图像。
这是与网站相关的HTML片段:
<div class="c-image _verticalMode">
<div class="c-image__inner">
<img src="https://images.example.com/qwe098.jpg/dims/optimize" class="c-image__image" width="100%">
</div>
</div>
这是我到目前为止写了多少:
import requests
from bs4 import BeautifulSoup as bs
url=r'https://www.example.com/d?tNo=123&aNo=17'
soup=bs(requests.get(url).content,'html.parser')
pics=soup.find(class_='c-image')
print(pics)
打印输出:
<div class="c-image _verticalMode">
<!--
<div class="c-image__inner">
<img src="../../img/c/dummy.jpg" class="c-image__image" width="100%">
</div>
-->
<!--
<a href="#" class="c-img__prev"><i class="i-arrow-left-black"></i></a>
<a href="#" class="c-img__next"><i class="i-arrow-right-black"></i></a>
-->
</div>
img src被截断了(由于BeautifulSoup?),但它看起来与第一个HTML代码段中给出的位置不同(尽管它位于相同的位置)。
我似乎无法设法获得链接。我尝试使用soup.find(class_='c-image__image')
,但是返回了NoneType对象。
我该怎么做才能获得正确的图像链接,以便我可以下载它?
答案 0 :(得分:1)
严格按照您的HTML代码段进行操作:
class ProjectsController < ApplicationController
before_action :set_project, only: [:show, :edit, :update, :destroy]
before_action :require_user
before_action :verify_project_access, only: [:show]
# GET /projects
# GET /projects.json
def index
@projects = Project.where(user: current_user).or(Project.where(id: current_user)).order('updated_at DESC')
end
# GET /projects/1
# GET /projects/1.json
def show
@task = Task.new
end
# GET /projects/new
def new
@project = Project.new
end
# GET /projects/1/edit
def edit
end
# POST /projects
# POST /projects.json
def create
@project = Project.new(project_params)
@project.user = current_user
respond_to do |format|
if @project.save
format.html { redirect_to @project, notice: "Project was successfully created." }
format.json { render :show, status: :created, location: @project }
else
format.html { render :new }
format.json { render json: @project.errors, status: :unprocessable_entity }
end
end
end
# PATCH/PUT /projects/1
# PATCH/PUT /projects/1.json
def update
respond_to do |format|
if @project.update(project_params)
format.html { redirect_to @project, notice: "Project was successfully updated." }
format.json { render :show, status: :ok, location: @project }
else
format.html { render :edit }
format.json { render json: @project.errors, status: :unprocessable_entity }
end
end
end
# DELETE /projects/1
# DELETE /projects/1.json
def destroy
@project.destroy
respond_to do |format|
format.html { redirect_to projects_url, notice: "Project was successfully destroyed." }
format.json { head :no_content }
end
end
private
# Use callbacks to share common setup or constraints between actions.
def set_project
@project = Project.find(params[:id])
end
# Only allow a list of trusted parameters through.
def project_params
params.require(:project).permit(:title, :description, :notes)
end
def verify_project_access
if @project.user != current_user
flash[:danger] = "You don't have access this project"
redirect_to projects_url
end
end
end
输出:
my_img = """[your html snippet]"""
from bs4 import BeautifulSoup as bs
soup = bs(my_img,'lxml')
pics=soup.select_one('div.c-image__inner img')
print(pics['src'])