我认为我的问题是页面上运行的javascript,直到我向下滚动才加载图像。有人可以帮我吗?脚本运行良好,直到我点击“ ZendikarRising(ZNR)”为止,该页面上有很多图像。然后我被告知无法从URL中保存imageMakindi Ox(ZNR).png ...它应该说一个URL,但返回''我合并了一些调试代码以绕过丢失的卡URL,但我却丢失了很多。
我尝试删除空字段,但是如果运行它,您会看到我的卡名和URL数量偶数(其中一些为空白),因此删除空URL会抛出总数,并导致我丢失集合中的卡片。
这是有问题的代码
import requests
import os
from os.path import basename
from bs4 import BeautifulSoup
path = os.getcwd()
print ("The current working directory is %s" % path)
url = 'https://scryfall.com/sets'
r=requests.get(url).text
soup = BeautifulSoup(r, 'html.parser')
####################GATHERS ALL URLS FROM SET DIRECTORY#####################
links = []
Urls = []
for link in soup.findAll('a'):
links.append(link.get('href'))
for link in links:
if link != None:
if 'https://scryfall.com/sets/' in link:
if link not in Urls:
Urls.append(link)
#################START OF ALL URL LOOPS################################
for Url in Urls: ##goes threw all the URLS gathered from the sets links
r=requests.get(Url).text
soup = BeautifulSoup(r, 'html.parser')
temp = soup.find('h1', {'class': 'set-header-title-h1'}).contents
temp = ''.join(temp)
temp = temp.strip()
temp = temp.replace(':', '')
temp = temp.replace(' ', '')
test2 = (f"{path}\\{temp}")
#############################################MAKE DIRECTORY FOR SET FOLDERS##################
try:
os.mkdir(test2)
except OSError:
print ("Creation of the directory %s failed" % test2)
else:
print ("Successfully created the directory %s " % test2)
############################################GATHER ALL IMAGES####################
images = soup.find_all('img')
pictures = [] ##stores all the picture URLS
names = [] ##stores all the name
for image in images[:-1]:
names.append(image.get('alt'))
pictures.append(image.get('src'))
####################SAVES ALL IMAGES AS FILES#################
x=0
for i in pictures:
fn = names[x] + '.png'
try:
with open(f'{test2}\\'+basename(fn),"wb") as f:
f.write(requests.get(i).content)
f.close
##print(i)
##print(f'saved {fn} to {path}')
x+=1
except OSError:
print(f"Failed to save image{fn} from url{i}")
print(len(pictures))
print(len(names))
exit()
##################RESETS IMAGES AND NAMES FOR NEXT SET FOLDER#############
pictures.clear()
names.clear()
Print("Completed With No Errors")
答案 0 :(得分:1)
实际上,图像是由JS脚本延迟加载的,尽管在页面后面没有发现具有<img>
属性的src
标签。
但是,解决方案非常简单。如果查看未加载的多个<img>
标签,您会发现图像链接不在src
属性中,而是在data-src
属性中。
例如:
<img alt="Wayward Guide-Beast (ZNR)" class="card znr border-black" data-component="lazy-image" data-src="https://c1.scryfall.com/file/scryfall-cards/normal/front/e/b/ebfe94fc-7a98-4f53-8fd0-f5fd016b1873.jpg?1599472001" src="" title="Wayward Guide-Beast (ZNR)"/>
因此,您所需要做的就是检查src
是否为空,如果是,请刮除data-src
属性。