查找进度条的下载速度

时间:2016-06-11 05:21:35

标签: python performance python-3.x download urllib

我正在编写一个脚本来从网站下载视频。我添加了一个报告钩子来获取下载进度。因此,它显示了下载数据的百分比和大小。我认为添加下载速度和eta会很有趣 问题是,如果我使用一个简单的speed = chunk_size/time,显示的速度是准确的,但疯狂地跳来跳去。所以,我使用了下载单个块的时间历史。像speed = chunk_size*n/sum(n_time_history)这样的东西 现在它显示了稳定的下载速度,但它肯定是错误的,因为它的值是几位/秒,而下载的文件显然以更快的速度增长。
谁能告诉我哪里出错了?

这是我的代码。

def dlProgress(count, blockSize, totalSize):
    global init_count
    global time_history
    try:
        time_history.append(time.monotonic())
    except NameError:
        time_history = [time.monotonic()]
    try:
        init_count
    except NameError:
        init_count = count
    percent = count*blockSize*100/totalSize
    dl, dlu = unitsize(count*blockSize)             #returns size in kB, MB, GB, etc.
    tdl, tdlu = unitsize(totalSize)
    count -= init_count                             #because continuation of partial downloads is supported
    if count > 0:
        n = 5                                       #length of time history to consider
        _count = n if count > n else count
        time_history = time_history[-_count:]
        time_diff = [i-j for i,j in zip(time_history[1:],time_history[:-1])]
        speed = blockSize*_count / sum(time_diff)
    else: speed = 0
    n = int(percent//4)
    try:
        eta = format_time((totalSize-blockSize*(count+1))//speed)
    except:
        eta = '>1 day'
    speed, speedu = unitsize(speed, True)           #returns speed in B/s, kB/s, MB/s, etc.
    sys.stdout.write("\r" + percent + "% |" + "#"*n + " "*(25-n) + "| " + dl + dlu  + "/" + tdl + tdlu + speed + speedu + eta)
    sys.stdout.flush()
  

修改
  纠正了逻辑。显示的下载速度现在好多了。
  当我增加用于计算速度的历史长度时,稳定性会增加,但速度的突然变化(如果下载停止等)不会显示。
   如何使其稳定,但对大变化敏感?

我意识到问题现在更多是以数学为导向的,但如果有人可以帮助我或指出我正确的方向,那就太棒了。
另外,请告诉我是否有更有效的方法来实现这一目标。

1 个答案:

答案 0 :(得分:1)

_count = n if count > n else count
time_history = time_history[-_count:]
time_weights = list(range(1,len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)

为了使其更稳定,并且在下载峰值上升或下降时没有反应,您也可以添加它:

_count = n if count > n else count
time_history = time_history[-_count:]
time_history.remove(min(time_history))
time_history.remove(max(time_history))
time_weights = list(range(1, len(time_history))) #just a simple linear weights
time_diff = [(i-j)*k for i,j in zip(time_history[1:], time_history[:-1],time_weights)]
speed = blockSize*(sum(time_weights)) / sum(time_diff)

这将删除time_history中的最高和最低峰值,这将使数字显示更稳定。如果你想挑剔,你可能可以在删除之前生成权重,然后使用time_diff.index(min(time_diff))过滤映射的值。

同样使用非线性函数(如sqrt())进行权重生成将为您提供更好的结果。哦,正如我在评论中所说:添加统计方法来过滤时间应该略微好一些,但我怀疑它不值得增加它的开销。