我当时从标题中刮取了这个site,并且还尝试了刮除标题之后的图像。原来是在抓取时返回了以下数据:
<div itemscope itemtype="https://schema.org/ItemList" class="group card-8-group-1 clearfix">
<meta itemprop="itemListOrder" content="https://schema.org/ItemListOrderDescending" />
<article itemprop="itemListElement" itemscope itemtype="https://schema.org/Article" class="card card-1 news-card-1 card-type-article type-article" data-sponsorship-type="card" data-sponsorship-article-id="1qo8sz0z1kaqb1dpj038v8658h" data-sponsorship-article-type="article" data-sponsorship-primary-tag="1pgecmpab62ei1akyb084izq3o" data-sponsorship-secondary-tag="22doj4sgsocqpxw45h607udje">
<a data-side="link" href="/en/news/spurs-investigation-aurier-appears-break-lockdown-protocols/1qo8sz0z1kaqb1dpj038v8658h" itemprop="url" data-sponsorship-slot="card" data-sponsorship-slot-id="front" class="type-article">
<div class="picture article-image" data-module="responsive-picture">
<img class="picture__image picture__image--lazyload" data-srcset="&quality=60&w=640 320w,&quality=60&w=560 480w,&quality=60&w=690 740w,&quality=60&w=800 980w,&quality=60&w=970 1580w" />
<noscript class="picture__polyfill"> <img src="https://images.daznservices.com/di/library/GOAL/5f/da/serge-aurier_191f5i34z69us1fausrs9k0mjk.jpg?t=1445827096&quality=60&h=170" alt="Serge Aurier" /> </noscript>
</div>
<div class="title">
<h3 title="Spurs launch investigation as Aurier appears to break lockdown protocols for a third time" itemprop="headline">Aurier appears to break lockdown protocols for a third time</h3>
<div class="image" data-sponsorship-slot="card" data-sponsorship-slot-id="image"></div>
</div>
它似乎页面正在使用延迟加载。我的问题是如何提取具有完整缩放的img?
答案 0 :(得分:1)
要获取完整尺寸的图像,只需在图像URL中手动将w=55
替换为w=970
即可。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.goal.com/en/premier-league/2kwbbcootiqqgmrzs6o5inle5'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for title, image in zip(soup.select('.card-type-article h3'),
soup.select('.card-type-article img')):
title = title.get_text(strip=True)
full_img_url = image['src'].replace('w=55', 'w=970')
print('{:<70}{}'.format(title, full_img_url))
打印:
Wenger calls for FFP reform amid Newcastle takeover talk https://images.daznservices.com/di/library/GOAL/63/cd/arsene-wenger-2019_13luew9ltpa2g1l1r6ziuxpwbw.jpg?t=1363081390&quality=60&w=970
'Special Havertz is half-Ozil, half-Ballack & would thrive in PL' https://images.daznservices.com/di/library/GOAL/cc/18/kai-havertz_7sugon9o7ljy1fg2xzkv1mqcm.jpg?t=-1186202400&quality=60&w=970
Solskjaer: I'd rather a hole in my squad than an asshole https://images.daznservices.com/di/library/GOAL/78/f2/ole-gunnar-solskjaer-manchester-united-2019-20_1vfk6liknrjlx1r8aumegh4cxe.jpg?t=-749345265&quality=60&w=970
Maguire praises Man Utd's 'safe' training return https://images.daznservices.com/di/library/GOAL/5d/e8/harry-maguire-man-utd_13ewrih27ahmb13i1zxfjrhrp8.jpg?t=-444094625&quality=60&w=970
Jorginho's agent opens door for Juve move https://images.daznservices.com/di/library/GOAL/69/da/jorginho-chelsea-2019-20_15zh5m3ojefx0zl1ei7qsyc14.jpg?t=-1675997073&quality=60&w=970
Premier League clubs near approval for contact training https://images.daznservices.com/di/library/GOAL/79/ce/mohamed-salah-dejan-lovren-liverpool-training_7zq70upa8l1618svdzls077xn.jpg?t=143669454&quality=60&w=970
Ceballos reiterates desire to succeed at Real Madrid https://images.daznservices.com/di/library/GOAL/97/c6/dani-ceballos-arsenal_1sywf8w828w4b193xoz5c82uuf.jpg?t=-1552361252&quality=60&w=970