Question

这可能看起来像一个愚蠢的问题，但我有一个400,000个项目的列表，其性能似乎与列表中的100个项目相同。在我看来，你只限于你可能拥有的RAM量和列表的最大大小？

具体来说：

如果我要搜索此列表（item in bigList）则有性能打击？
如果我追加说200,000个项目，是否有性能损失 400,000项目清单？
如果我遍历每个项目，是否只有性能损失列表？
如果在使用列表时有性能命中，那么会是什么典型的最大尺寸？

Answer 1

来自manual：

Python的列表实际上是可变长度数组

这意味着搜索在O（N）中完成，其中N是列表长度。如果您需要不同的实现，可以查看collections。或使用集合（内部哈希表）

Answer 2

如果我要搜索此列表（bigList中的项目）

然后你会使用set。要检查项目是否在列表中，您必须查看列表中的每个项目。 set可以直接跳到商品存放的地方，看看它是否存在。

如果我追加200,000个项目到这个400,000个项目列表，是否有性能损失？

不，无论列表大小如何，追加都需要同时进行。

如果使用列表中有性能命中，那么典型的最大大小是什么？

没有意义，列表可以以多种不同的方式使用。简而言之，列表擅长存储并且不善于查找内容。

Answer 3

这些问题非常具有假设性，并且一眼就看出答案和评论并不是真的很感激。如果您希望了解列表类型对您有效的效果，您可以尝试自己对其进行分析。

我提供了一个小脚本作为起点：

from sys import getsizeof
from timeit import timeit
from random import randint

MIN_EXP, MAX_EXP = 1, 8 # list-size from 10^1 to 10^8 as default
TEST_ITERATIONS = 1

class ListTest(object):
    def __init__(self, number, item):
        self.number = number
        self.item = item
        self.size = '0'
        self.time = 0.0

    def test_01_creation_list_comprehension(self):
        list_ = []
        def profile():
            list_ = [self.item for n in range(self.number)]
        self.time = timeit(profile, number = TEST_ITERATIONS)
        self.size = "%.3f" % (getsizeof(list_) / 1024.0,)

    def test_02_creation_list_append(self):
        list_ = []
        def profile():
            for n in range(self.number):
                list_.append(self.item)
        self.time = timeit(profile, number = TEST_ITERATIONS)                
        self.size = "%.3f" % (getsizeof(list_) / 1024.0,)

    def test_03_find_item(self):        
        list_ = [self.item for n in range(self.number)]
        searchstr = 'find me!'
        list_[randint(0,self.number)] = searchstr
        def profile():
            foundya = list_.index(searchstr)            
        self.time = timeit(profile, number = TEST_ITERATIONS)                


tests = [ListTest(10**n,'string-item') for n in range(MIN_EXP,MAX_EXP)]
for test in tests:
    print "-"*78,"\nListTest with %d items:" % test.number
    for subtest in [st for st in dir(test) if st.startswith('test_')]:
        getattr(test, subtest)()
        print "%15s" % "%s: %.4f s" % (subtest, test.time)
        print "%32s" % "%s %14s %s"  % ("Memory used:", test.size, "kB")

我得到一个包含1000万个条目的列表，这些结果（1亿个不能用我的记忆力计算）

 >>>   ListTest with 10000000 items:
 >>>        test_01_creation_list_comprehension: 1.7200 s
 >>>                         Memory used:          0.031 kB
 >>>        test_02_creation_list_append: 2.8455 s
 >>>                         Memory used:      39808.621 kB
 >>>        test_03_find_item: 0.1657 s
 >>>                         Memory used:      39808.621 kB

内存使用量更多是衡量数量的指标，然后是实际消费量。 sys.getsizeof函数对于标准类型大多是正确的，包括gc开销，但不适用于复杂对象或外部（外部）对象。

Python使用列表会对性能产生影响吗？

3 个答案: