Question

我已经编写了一个python脚本来从某些Javacript中提取图像网址并保存图像，但是，在使用Preview打开时，我收到消息

该文件可能已损坏，或使用了预览无法识别的文件格式。

在编辑器中打开df1 = df.unstack().reset_index() df1.columns = ['portfolio','date','val'] print (df1) portfolio date val 0 ME1_BM1 1932-02-29 2.11875 1 ME1_BM1 1932-03-31 2.18567 2 ME1_BM2 1932-02-29 1.28388 3 ME1_BM2 1932-03-31 1.24275时，经过仔细检查，似乎脚本正在保存HTML。我在哪里错了？任何帮助将不胜感激。

.jpeg

Answer 1

似乎item['large']不是图像链接。我在笔记本中运行了您的代码，当我单击笔记本中输出的链接时，它使我转到了另一个网页。因此，您需要在这里更深入一点。您可以像这样修改循环：

for item in json_str:
    print(item['large'])

    r = request.urlopen(item['large'])
    s = BeautifulSoup(r, 'html.parser')

    filename = item['large'].split('/')[-1]
    req =  request.Request(s.find('iframe').get('src'),
        headers = {
            'User-agent':
                'Mozilla/5.0 (Windows NT 5.1; rv:43.0) Gecko/20100101 Firefox/43.0'})
    resp = request.urlopen(req)
    with open(filename, "wb") as fd:
        fd.write(resp.read())

Python图像抓取返回HTML

1 个答案: