Question

你好我是Python的新手，我有一个简单的功能，如下所示：

def normal_list(p, data, n):
  cost = 0
  cost_list = []
  list = []
  clappend=cost_list.append
  listappend=list.append
  for i in range(n):
    x = np.random.choice(data, p=p)
    if (len(list) == 0):
        listappend(x)
    else:
        for i in list:
            if i == x:
                clappend(cost)
            elif i == list[-1]:
                    listappend(x)
            cost += 1
        cost = 0
  return cost_list

其中p是概率列表，数据是几乎所有情况下的数据列表，它将列为数字1-100 我必须加快速度，因为我需要使用此函数n = 100000然后需要很长时间才能结束cost_list的内存错误。感谢所有建议，我发现了我的错误。在我的列表中找到x之后，我搞砸了我的断开循环。所以再次感谢并使用发电机是一个好主意。

Answer 1

您可以消除外部条件检查。您的list仅为空一次：在第一次迭代之前。为什么不立即初始化并跳过外部循环的第一次迭代和完全检查：

def normal_list(p, data, n):
    cost = 0
    cost_list = []
    _list = [np.random.choice(data, p=p)]
    #clappend=cost_list.append # Don't do this! It's confusing!
    #listappend=_list.append(x) # Don't do this! It's confusing!
    for i in range(1,n):
        ...

Answer 2

如果 n 非常大，请考虑拆分该功能并将其更改为generators。使用yield而不是返回将通过产生结果来保存内存分配＃34;即时＃34;而不是在返回之前收集所有东西。

这可能不是上述生成器的完全功能实现，但它是一个开始：

def sub1(lst, x):
    cost = 0
    for e in lst:
        cost += 1
        if e == x:
            yield (cost, None)
        elif e == lst[-1]:
            yield (None, x)

def normal_list(p, data, n):
    lst = []
    for i in range(n):
        x = np.random.choice(data, p=p)
        if len(lst) == 0:
            yield x
        else:
            for res in sub1(lst, x):
                if res[0] is not None:
                    yield res[0]
                else:
                    lst += res[1]

虽然这似乎更像是一个内存问题，而不是速度问题，但您可以尝试使用PyPy来获得更快的Python环境。（第一点应该足够了）

Python速度和内存

2 个答案: