Question

我在Python中使用Bs4从nmgncp.com下载壁纸。但是，代码仅下载16KB文件，而完整图像大约为300KB。请帮我。我甚至尝试过wget.download方法。

PS： - 我在Windows 10上使用Python 3.6。

这是我的代码:: -

from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os


url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'

html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com/'+a
print(newurl)

response = requests.get(newurl)
if response.status_code == 200:
    with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
        f.write(response.content)

Answer 1

您的问题的根源是因为有保护：图像页面需要引用，否则它会重定向到html页面。

源代码已修复：

from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os


url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'

html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com'+a
print(newurl)

response = requests.get(newurl, headers={'referer': newurl})
if response.status_code == 200:
    with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
        f.write(response.content)

Answer 2

首先，http://www.nmgncp.com/dark-wallpaper-1920x1080.html是一个HTML文档。其次，当您尝试通过直接URL（例如：http://www.nmgncp.com/data/out/95/4351795-dark-wallpaper-1920x1080.jpg）下载图像时，它还会将您重定向到HTML文档。这很可能是因为hoster（nmgncp.com）不想提供其图像的直接链接。他可以通过查看HTTP referer并确定它是否有效来检查图像是否被直接调用。因此，在这种情况下，您需要付出更多努力让主持人认为您是直接网址的有效来电者。

无法在Python中下载完整文件

2 个答案: