为什么是len(<a list="" object="">) so slow?

时间:2016-02-03 23:06:14

标签: python ipython timing

I'm running the following code in an ipython session:

# This call is slow, but that is expected. (It loads 3 GB of data.)
In [3]: arc, arc_sub, upls, go = foo_mod.ready_set()

# This call is also slow, as `upls` is huge.
In [4]: upls = list(upls)

# This call is slow in meatspace, but `%timeit` doesn't notice!
In [5]: %timeit -n1 -r1 len(upls)
1 loops, best of 1: 954 ns per loop

%timeit is straight-up lying here. With or without the %timeit, the command takes upwards of 10s to actually run. Only the first time, however; subsequent calls to len are quick.

Even time.time() sings a similar tune:

In [5]: import time

In [6]: s = time.time(); len_ = len(upls); e = time.time()

In [7]: e - s
Out[7]: 7.104873657226562e-05

But it took seconds in the real world for In [6] to actually complete. I just don't seem to be able to capture where the actual time is being spent!

There's nothing terribly unusual about the list, aside from it's huge: it's a real list; it holds ~¼ billion bson.ObjectId objects. (Prior to the list() call, it's a set object; that call is also slow, but that makes sense; list(<set instance>) is O(n), and my set is huge.)

Edit re GC

If I run gc.set_debug(gc.DEBUG_STATS) just prior to ready_set, which itself is a slow call, I see tons of GC cycles. This is expected. gen3 grows:

gc: objects in each generation: 702 701 3289802
gc: done, 0.0000s elapsed.
gc: collecting generation 0...
gc: objects in each generation: 702 1402 3289802
gc: done, 0.0000s elapsed.
gc: collecting generation 0...
gc: objects in each generation: 702 2103 3289802

Unfortunately the console outputs make this runtime of this impossibly slow. If I instead delay the gc.set_debug call until just after ready_set, I don't see any GC cycles, but gc.get_count() claims the generations are tiny:

In [6]: gc.get_count()
Out[6]: (43, 1, 193)

In [7]: len(upls)
Out[7]: 125636395

(but why/how is get_count less objects than what's in the list?; they're definitely all unique, since they just went through a set…) The fact that involving gc in the code makes len speedy leads me to believe I'm paused for a collect-the-world.

(Versions, just in case:

Python 2.7.6 (default, Mar 22 2014, 22:59:56)
IPython 3.2.0 -- An enhanced Interactive Python.

)

1 个答案:

答案 0 :(得分:2)

我会将你的问题的评论总结为答案。

正如大家所说(并且你指出了),Python的 <div class="product-reviews"> <% @comments.each do |comment| %> <div class="row" style="padding-left:4%;"> <HR> <p><small><%= comment.user.email %><em><%= " #{time_ago_in_words(comment.created_at)} ago" %></em></small></p> <div class="rated" data-score="<%= comment.rating %>"></div> <p><%= comment.body %></p> <% if signed_in? && current_user.admin? %> <p><%= link_to 'Destroy', product_comment_path( @product, comment), method: :delete, data: { confirm: 'Are you sure?'} %></p> <% end %> </div> <% end %> 对象知道它的大小并且它returns just the stored number

list

static Py_ssize_t list_length(PyListObject *a) { return Py_SIZE(a); } is defined

  

Py_SIZE(O)

     

此宏用于访问Python对象的ob_size成员。它扩展到:   Py_SIZE

所以我可以断定它不应该做任何计算。唯一怀疑的是您尝试转换为列表的对象。但是,如果你发誓它真的是(((PyVarObject*)(o))->ob_size),没有任何假对象用一些懒惰的计算来模拟它的方法 - 它不是它。

所以我假设所有list方法确实显示了调用timeit函数所花费的确切时间。

唯一浪费时间的过程是.. 垃圾收集器。在测量结束时,它发现没有人使用如此大的数据并开始释放内存。当然,需要几秒钟。