我正在尝试创建一个与此wget命令完全相同的Python函数:
wget -c --read-timeout=5 --tries=0 "$URL"
-c
- 如果下载中断,请从中断处继续。
--read-timeout=5
- 如果没有新数据进入超过5秒,请放弃并重试。给定-c
这意味着它将从它停止的地方再次尝试。
--tries=0
- 永远重试。
串联使用的这三个参数导致下载无法失败。
我想在我的Python脚本中复制这些功能,但我不知道从哪里开始......
答案 0 :(得分:74)
还有一个很好用的Python模块wget
。找到here。
这证明了设计的简单性:
>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'
享受。
但是,如果wget
无法正常工作(我的某些PDF文件出现问题),请尝试this solution。
编辑:您还可以使用out
参数来使用自定义输出目录而不是当前工作目录。
>>> output_directory = <directory_name>
>>> filename = wget.download(url, out=output_directory)
>>> filename
'razorback.mp3'
答案 1 :(得分:22)
urllib.request应该有效。 只需将其设置为while(未完成)循环,检查本地文件是否已存在,是否确实发送带有RANGE标头的GET,指定下载本地文件的程度。 请务必使用read()附加到本地文件,直到发生错误。
这也可能与Python urllib2 resume download doesn't work when network reconnects
重复答案 2 :(得分:15)
import urllib2
attempts = 0
while attempts < 3:
try:
response = urllib2.urlopen("http://example.com", timeout = 5)
content = response.read()
f = open( "local/index.html", 'w' )
f.write( content )
f.close()
break
except urllib2.URLError as e:
attempts += 1
print type(e)
答案 3 :(得分:8)
我必须在没有正确的选项编译成wget的linux版本上做这样的事情。此示例用于下载内存分析工具'guppy'。我不确定它是否重要,但我保持目标文件的名称与url目标名称相同...
以下是我提出的建议:
python -c "import requests; r = requests.get('https://pypi.python.org/packages/source/g/guppy/guppy-0.1.10.tar.gz') ; open('guppy-0.1.10.tar.gz' , 'wb').write(r.content)"
这是单行,这里有点可读性:
import requests
fname = 'guppy-0.1.10.tar.gz'
url = 'https://pypi.python.org/packages/source/g/guppy/' + fname
r = requests.get(url)
open(fname , 'wb').write(r.content)
这适用于下载tarball。我能够提取包并在下载后下载它。
编辑:
要解决问题,这里是一个打印到STDOUT的进度条的实现。没有clint
包,可能有一种更便携的方法来做到这一点,但这已在我的机器上测试并且工作正常:
#!/usr/bin/env python
from clint.textui import progress
import requests
fname = 'guppy-0.1.10.tar.gz'
url = 'https://pypi.python.org/packages/source/g/guppy/' + fname
r = requests.get(url, stream=True)
with open(fname, 'wb') as f:
total_length = int(r.headers.get('content-length'))
for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length/1024) + 1):
if chunk:
f.write(chunk)
f.flush()
答案 4 :(得分:2)
这是torchvision library所采用的代码:
import urllib
def download_url(url, root, filename=None):
"""Download a file from a url and place it in root.
Args:
url (str): URL to download file from
root (str): Directory to place downloaded file in
filename (str, optional): Name to save the file under. If None, use the basename of the URL
"""
root = os.path.expanduser(root)
if not filename:
filename = os.path.basename(url)
fpath = os.path.join(root, filename)
os.makedirs(root, exist_ok=True)
try:
print('Downloading ' + url + ' to ' + fpath)
urllib.request.urlretrieve(url, fpath)
except (urllib.error.URLError, IOError) as e:
if url[:5] == 'https':
url = url.replace('https:', 'http:')
print('Failed download. Trying https -> http instead.'
' Downloading ' + url + ' to ' + fpath)
urllib.request.urlretrieve(url, fpath)
如果可以依赖Torchvision库,那么您也可以简单地这样做:
from torchvision.datasets.utils import download_url
download_url('http://something.com/file.zip', '~/my_folder`)
答案 5 :(得分:1)
简单如py:
class Downloder():
def download_manager(self, url, destination='Files/DownloderApp/', try_number="10", time_out="60"):
#threading.Thread(target=self._wget_dl, args=(url, destination, try_number, time_out, log_file)).start()
if self._wget_dl(url, destination, try_number, time_out, log_file) == 0:
return True
else:
return False
def _wget_dl(self,url, destination, try_number, time_out):
import subprocess
command=["wget", "-c", "-P", destination, "-t", try_number, "-T", time_out , url]
try:
download_state=subprocess.call(command)
except Exception as e:
print(e)
#if download_state==0 => successfull download
return download_state
答案 6 :(得分:0)
让我用线程改进一个例子,以备下载许多文件。
import math
import random
import threading
import requests
from clint.textui import progress
# You must define a proxy list
# I suggests https://free-proxy-list.net/
proxies = {
0: {'http': 'http://34.208.47.183:80'},
1: {'http': 'http://40.69.191.149:3128'},
2: {'http': 'http://104.154.205.214:1080'},
3: {'http': 'http://52.11.190.64:3128'}
}
# you must define the list for files do you want download
videos = [
"https://i.stack.imgur.com/g2BHi.jpg",
"https://i.stack.imgur.com/NURaP.jpg"
]
downloaderses = list()
def downloaders(video, selected_proxy):
print("Downloading file named {} by proxy {}...".format(video, selected_proxy))
r = requests.get(video, stream=True, proxies=selected_proxy)
nombre_video = video.split("/")[3]
with open(nombre_video, 'wb') as f:
total_length = int(r.headers.get('content-length'))
for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length / 1024) + 1):
if chunk:
f.write(chunk)
f.flush()
for video in videos:
selected_proxy = proxies[math.floor(random.random() * len(proxies))]
t = threading.Thread(target=downloaders, args=(video, selected_proxy))
downloaderses.append(t)
for _downloaders in downloaderses:
_downloaders.start()
答案 7 :(得分:0)
我经常发现一个更简单,更强大的解决方案是在python中简单地执行终端命令。就您而言:
import os
url = 'https://www.someurl.com'
os.system(f"""wget -c --read-timeout=5 --tries=0 "{url}"""")
答案 8 :(得分:0)
对于 Windows 和 Python 3.x ,我为重新命名下载文件贡献了两分钱:
pip install wget
import wget
wget.download('Url', 'C:\\PathToMyDownloadFolder\\NewFileName.extension')
真正有效的命令行示例:
python -c "import wget; wget.download(""https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.17.2.tar.xz"", ""C:\\Users\\TestName.TestExtension"")"
注意:“ C:\\ PathToMyDownloadFolder \\ NewFileName.extension”不是必需的。默认情况下,该文件不会重命名,并且下载文件夹是您的本地路径。
答案 9 :(得分:-1)
TensorFlow使生活更轻松。 文件路径为我们提供了下载文件的位置。
iframe{
width: 70%;
height: 20vh;
display:inline;
}
textarea{
width: 20%;
height: 20vh;
display:inline;
}