尝试使用Beautiful Soup和Python来刮取图像源

时间:2017-07-31 14:42:19

标签: html python-3.x beautifulsoup

我正在尝试检索标记中图像的来源,我有一个下面的html代码片段。

            <img alt="Magellan Outdoors Men's Laguna Madre Solid Short Sleeve Fishing Shirt" src="//assets.academy.com/mgen/81/10762881.jpg?is=500,500" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';">

基本上,在整个页面的html中,这一行重复每个不同的服装项目和“src”中的每个img标签内,我想获得图像源。我现在在python中的代码打印每个img标记。

from bs4 import BeautifulSoup as soup
with open('Mens_Shirts.html' ,"r") as menShirts:
    page_soup = soup(menShirts, "lxml")

image = page_soup.findAll("img")

for i in image:
    print(i)

结果:

<img alt="" src="//content.academy.com/aurora/category/2017/clothing/men/fishingshirts-hd.jpg" width="100%"/>
<img alt="Magellan Outdoors Men's Laguna Madre Solid Short Sleeve Fishing Shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="//assets.academy.com/mgen/81/10762881.jpg?is=500,500"/>
<img alt="Magellan Outdoors Men's Laguna Madre Solid Short Sleeve Fishing Shirt" data-blzexdl="1" data-feo-orig-src="//assets.academy.com/mgen/39/10739939.jpg?is=500,500" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="http://1.resources.www.academy.com.edgekey.net/4/W/zhmM8JXG8.webp"/>
<img alt="Rawlings Men's 3/4 Sleeve T-shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="//assets.academy.com/mgen/59/10137459.jpg?is=500,500"/>
<img alt="BCG Men's Turbo Mesh Short Sleeve T-shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="//assets.academy.com/mgen/12/10740412.jpg?is=500,500"/>
<img alt="Nike Men's Elite Back Stripe T-shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="//assets.academy.com/mgen/77/10568677.jpg?is=500,500"/>

我尝试在“src =”中获取图像源但是我尝试的代码没有给出所需的输出,那么从“src =”中提取图像源的最佳方法是什么?更具体地说,大多数图像源以“//assets.academy.com”开头。

0 个答案:

没有答案