使用urllib从JSON url下载图像

时间:2018-02-23 10:23:10

标签: python json urllib

感谢阅读这篇文章。

我想从JSON网址下载图片,我不明白为什么 检索功能无法正确读取我的网址。 我试图将url粘贴到检索函数中,如:

testfile.retrieve("http://fishbase.org/images/thumbnails
/jpg/tn_Quatr_f0.jpg", "tmp/images/full/fish")

它完美无缺。

import urllib
import json

with open('fiche.json') as json_data:
    d = json.load(json_data)
    for obj in d:
        name = json.dumps(obj['taxonomy'][0])
        url = json.dumps(obj['image_urls'][0])
        print(name)
        print(url)
        testfile = urllib.URLopener()
        testfile.retrieve(url, "tmp/images/full/fish") 

我知道目前它会保留我Json的最后一张图片,我会在解决第一个问题后修复它。

JSON:

[
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=16520", "taxonomy": ["Quassiremus ascensionis"], "image_urls": ["http://fishbase.org/images/thumbnails/gif/tn_OPHICHT0.gif"]},
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=14873", "taxonomy": ["Quinca mirifica"], "image_urls": ["http://fishbase.org/images/thumbnails/gif/tn_APOGONT0.gif"]},
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=27173", "taxonomy": ["Quassiremus polyclitellum"], "image_urls": ["http://fishbase.org/images/thumbnails/gif/tn_OPHICHT0.gif"]},
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=3896", "taxonomy": ["Quietula y-cauda"], "image_urls": ["http://fishbase.org/images/thumbnails/gif/tn_GOBIIDT0.gif"]},
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=25547", "taxonomy": ["Quassiremus evionthas"], "image_urls": ["http://fishbase.org/images/thumbnails/jpg/tn_Quevi_u0.jpg"]},
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=62532", "taxonomy": ["Quietula guaymasiae"], "image_urls": ["http://fishbase.org/images/thumbnails/jpg/tn_Qugua_u0.jpg"]},
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=13924", "taxonomy": ["Quassiremus nothochir"], "image_urls": ["http://fishbase.org/images/thumbnails/jpg/tn_Qunot_u1.jpg"]},
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=62338", "taxonomy": ["Qianlabeo striatus"], "image_urls": ["http://fishbase.org/images/thumbnails/gif/tn_CYPRINT0.gif"]},
{"fish_url": "http://fishbase.org/Summary/SpeciesSummary.php?id=27728", "taxonomy": ["Quintana atrizona"], "image_urls": ["http://fishbase.org/images/thumbnails/jpg/tn_Quatr_f0.jpg"]}
]

我的要求的结果是:

"Quassiremus ascensionis"
"http://fishbase.org/images/thumbnails/gif/tn_OPHICHT0.gif"
Traceback (most recent call last):
  File "dlimg.py", line 12, in <module>
    testfile.retrieve(url, "tmp/images/full/fish")
  File "/usr/lib/python2.7/urllib.py", line 245, in retrieve
    fp = self.open(url, data)
  File "/usr/lib/python2.7/urllib.py", line 210, in open
    return self.open_unknown(fullurl, data)
  File "/usr/lib/python2.7/urllib.py", line 222, in open_unknown
    raise IOError, ('url error', 'unknown url type', type)
IOError: [Errno url error] unknown url type: '%22http'

我搜索一个解决方案,但我花了1个小时,却找不到任何东西。

感谢您的回答:)

1 个答案:

答案 0 :(得分:0)

似乎有&#34;您网址中的引号

IOError: [Errno url error] unknown url type: '%22http'

你看到%22http

试试这个

testfile.retrieve(url.replace('"',''), "tmp/images/full/fish") 

我希望它能解决问题