Question

我目前正在运行此代码：

for dicword in dictionary:
    for line in train:
        for word in line:
            if dicword == word:
                pWord[i] = pWord[i] + 1
    i = i + 1

其中dictionary和pWord是相同大小的1D列表，train是2D列表。

字典和火车都很大，代码执行缓慢。

如何优化这样的特定代码和代码呢？

修改： train是一个包含大约2000个列表的列表，每个列表包含从文档中提取的单个单词。 dictionary是通过从所有列车中提取每个唯一字来创建的。

这是字典的创建：

dictionary = []
for line in train:
    for word in line:
        if word not in dictionary:
            dictionary.append(word)

编辑2：每个列表中的内容示例：

[ ... , 'It', 'ran', 'at', 'the', 'same', 'time', 'as', 'some', 'other', 'programs', 'about', ...]

Answer 1

您可以使用Counter。

function largest_phone_number(arr) {
  return arr.map(function(elem){
    return elem.replace("-", "");
  });
}

请注意，计数器本身是使用generator expression（又名generator comprehension）构建的。

另请注意，您甚至不需要创建字典。它是通过from collections import Counter train = [["big", "long", "list", "of", "big", "words"], ["small", "short", "list", "of", "short", "words"]] c = Counter(word for line in train for word in line) >>> c Counter({'big': 2, 'list': 2, 'long': 1, 'of': 2, 'short': 2, 'small': 1, 'words': 2})为您创建的。

然后，您可以使用字典理解来获取最常见的单词，例如前5名：

Counter

Answer 2

通常优化Python for循环？

帮助处理很多元素的处理列表的一个好策略是使用生成器（see also python docs on generators）。如果您通过大型列表进行流式处理，转换元素或聚合它们，则在给定时间内可能不需要全部内存。

我该如何优化此代码？以及如何优化Python for for循环？

2 个答案: