有这个问题。我不知道如何展示一个img。例如:
<img srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 390w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 458w" src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg">
正如您在上面所看到的,有不同的替代图像,但我试图刮掉一个要显示的图像。
import bs4 as bs
import urllib.request
import datetime
import random
import re
random.seed(datetime.datetime.now())
sauce = urllib.request.urlopen('http://www.manchestereveningnews.co.uk/news/greater-manchester-news').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
#
title = soup.title
link = soup.link
image = re.search(img 'srcset=img(.*?),)
#this doesnt work, not sure how to
strong = soup.strong
description = soup.description
location = soup.location
title = soup.find('h1', class_ ='publication-font', )
image = soup.find('img')
strong = soup.find('strong')
location = soup.find('em').find('a')
description = soup.find('div', class_='description',to.text)
#Previous Code
print("H1:", title.text)
print("Article Link:", link)
print("Image Url:\n", image)
print("1st Paragraph:\n", strong.text)
print("2nd Paragraph:\n", description.string)
print("Location:\n", location.text)
我的代码在上面,但是在我之前尝试时的上一个结果会显示:
Greater Manchester News
<link href="rss.xml" rel="alternate" title="Default home feed"
type="application/rss+xml"/>
<img data-`src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNA`TES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg" data-`srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTE`RNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w,` http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALT`ERNATES/s
390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-`Trafford-home-last-Thursday.jpg 390w, `http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-t`he-attack-outs`ide-his-
Trafford-home-last-Thursday.jpg 458w"/>
Family of dad stabbed in the neck while defendin
g his fiancée from thugs speak of their heartbreak
Mike Grimshaw, 34, died after being stabbed in the neck outside his
home in Trafford last Thursday
Trafford
在结果中,显示多个图像名称,但我尝试仅显示单个图像链接。我该怎么做呢
任何想法都会非常感激。
答案 0 :(得分:0)
您可以访问属性data-src
或data-srcset
以获取所需的图片:
image = soup.find('img')
single_img = image.get('data-src') # return the main image link
或
import re
image = soup.find('img')
img_string = image.get('data-srcset') # this return a string you have to parse
img_set = re.findall(r'(https?://[^\s]+)', img_set) # regex to match only links
然后你可以在img_set中访问你想要的任何索引(只测试列表的长度)