Question

我正在尝试从scryfall.com下载魔术收集卡的图像。他们向json文件提供有关每张卡的所有信息（包括其图片的网址）。所以我写了一个代码，从该json文件读取每个url，并尝试保存它。事实是，代码的请求部分要花5分钟以上才能运行每个图像，我不知道为什么。（我要抓取的每张图片的大小都小于100kB，并会在浏览器中立即打开）

我尝试了urllib.urlretrieve，urllib2.urlopen，并且都一样。尝试在python2和python3上同时运行它。

没有错误消息，该代码实际上可以正常工作，只有花费很长时间才能继续进行下去。

编辑：

df1 <- structure(list(Household = 1:3, INCOM = c("(5) $50,000 - $74,999", 
"(3) $25,000 - $34,99", "(4) $35,000 - $49,999")), class = "data.frame",
row.names = c(NA, 
-3L))

edit2：我正在使用的json链接：https://archive.scryfall.com/json/scryfall-default-cards.json

Answer 1

此代码可在不到1秒的时间内获取图像

import requests

url = 'https://img.scryfall.com/cards/normal/front/2/c/2c23b39b-a4d6-4f10-8ced-fa4b1ed2cf74.jpg?1561567651'
r = requests.get(url)

with open('image.jpg', 'wb') as f:
    f.write(r.content)

与此代码相同

import urllib.request

url = 'https://img.scryfall.com/cards/normal/front/2/c/2c23b39b-a4d6-4f10-8ced-fa4b1ed2cf74.jpg?1561567651'
urllib.request.urlretrieve(url, 'image.jpg')

我没有检查更多图像。也许问题在于服务器在短时间内看到来自一个IP的太多请求然后阻止了它们。

编辑：我使用此代码下载了10张图片并显示了时间

import urllib.request
import time
import json

print('load json')

start = time.time()
content = json.loads(open("scryfall-default-cards.json").read())
end = time.time()
print('time:', end-start)

# ---

start = time.time()

all_urls = len(content)

urls_to_download = 0
for item in content:
    if item['layout'] == 'normal' and item['digital'] is False:
        urls_to_download += 1

print('urls:', 

all_urls, urls_to_download)

end = time.time()
print('time:', end-start)

# ----

start = time.time()
count = 0
for item in content:
    if item['layout'] == 'normal' and item['digital'] is False:
        count += 1
        url = item['image_uris']['normal']
        name = url.split('?')[0].split('/')[-1]
        print(name)
        urllib.request.urlretrieve(url, 'imgs/' + name)
    if count >= 10:
        break
end = time.time()
print('time:', end-start)

结果

load json
time: 3.9926743507385254
urls: 47237 41805
time: 0.054879188537597656
2c23b39b-a4d6-4f10-8ced-fa4b1ed2cf74.jpg
37bc0128-a8d0-477c-abcf-2bdc9e38b872.jpg
2ae1bb79-a931-4d2e-9cc9-a06862dc5cde.jpg
4889a668-0f01-4447-ad2e-91b329258f22.jpg
5b13ba5a-f4b0-420a-9e4f-a65e57721fa4.jpg
893b309d-5e8f-47fa-9f54-eaf16a5f96e3.jpg
27d30285-7729-4130-a768-71867aefe9b3.jpg
783616d6-e3ea-43fd-97eb-6e4c5a2c711f.jpg
cc101b90-3e17-4beb-a606-3e76088e362c.jpg
36da00e3-3ef6-4ad5-a53d-e71cfdafc1e6.jpg
42e1033b-383e-49b4-875f-ccdc94e08c9d.jpg
time: 2.656561851501465

Answer 2

这里是一种非常简单有效的方法，可以非常快速地捕获这些图像。我没有计时，但是也不到一秒钟。

<s:String x:Key="Example" xml:space="preserve">&#10;Has a <Font color=red>30%</font> chance to get an extra item.</s:String>

使用python（urllib / urllib2）下载图像非常慢

2 个答案: