“

Question

“

假设我们对数据集中最常出现的时区（tz字段）感兴趣。我们有很多方法可以做到这一点。首先，让我们使用列表解析再次提取时区列表：

In [26]: time_zones = [rec['tz'] for rec in records if 'tz' in rec]
In [27]: time_zones[:10]
Out[27]: [u'America/New_York', u'America/Denver', u'America/New_York', u'America/Sao_Paulo', u'America/New_York', u'America/New_York', u'Europe/Warsaw', u'', u'', u'']

现在，按时区生成计数：

def get_counts(sequence): 
   counts = {}
   for x in sequence: 
      if x in counts:
          counts[x] += 1 
      else:
          counts[x] = 1 
   return counts

”

这是教科书的摘录，我不太了解用于查找某个时区出现次数的循环。有人可以直观地为我分解，我是初学者。

跟进问题：

“

如果我们想要前10个时区及其数量，我们必须做一些词典杂技：

def top_counts(count_dict, n=10):
    value_key_pairs = [(count, tz) for tz, count in count_dict.items()]
    value_key_pairs.sort()
    return value_key_pairs[-n:]

”

引用标记摘录。有人可以解释函数top_counts中发生了什么吗？

Answer 1

def get_counts(sequence):  # Defines the function.
   counts = {}             # Creates an empty dictionary.
   for x in sequence:      # Loops through each item in sequence
      if x in counts:      # If item already exists in dictionary
          counts[x] += 1   # Add one to the current item in dictionary
      else:                # Otherwise...
          counts[x] = 1    # Add item to dictionary, give it a count of 1
   return counts           # Returns the resulting dictionary.

Answer 2

这里的主要操作是字典查找。

if x in counts:

检查时区是否已被计算。如果它存在于计数字典中，它将递增。如果它尚不存在，请创建一个新条目并将其设置为1.

Answer 3

这基本上是使用字典（或哈希表）来存储每个时区发生的次数。每个总计存储在counts中，由时区字符串键入。这允许我们快速查找现有计数，以便我们可以将其递增一个。

首先，我们遍历sequence中的每个值：

for x in sequence:

对于每次迭代，x将等于当前值。例如，在第一次迭代中，x将等于 America / New_York 。

接下来，我们有这个令人困惑的部分：

if x in counts:
   counts[x] += 1 
else:
   counts[x] = 1

由于您无法增加不存在的内容，我们需要首先检查该键是否已存在于地图中。如果我们以前从未遇到过该时区，那么它就不存在了。因此，我们需要将其初始值设置为1，因为我们知道它到目前为止已经发生过至少一次。

如果确实存在（x在counts中），我们只需要将该键增加一个：

counts[x] += 1

希望现在更有意义了！

Answer 4

鉴于序列为u'America/New_York', u'America/Denver', u'America/New_York', u'America/Sao_Paulo', u'America/New_York', u'America/New_York', u'Europe/Warsaw', u'', u'', u'']

它会是这样的：

  for x in sequence:    # traverse sequence, "u'America/New_York'" is the first item: 
     if x in counts:    # if "u'America/New_York'" in counts:
        counts[x] += 1  #    counts["u'America/New_York'"] += 1
     else:              # else:
        counts[x] = 1   #    counts["u'America/New_York'"] = 1
                        # and so on...      
  return counts

Answer 5

函数get_counts执行以下操作：

对于列表中的每个时区：

检查字典中是否已存在时区（if x in counts）。
如果是这样，请将出现次数增加1（counts[x] += 1）。
如果没有，请将计数初始化为1（counts[x] = 1）。

如果你很好奇，你也可以这样做：

from collections import Counter
ctr = Counter()
for x in sequence:
    ctr[x] += 1

计数器会自动为缺少的项目返回0，因此您无需初始化它。

Answer 6

回复：跟进问题。

def top_counts(count_dict, n=10):
    value_key_pairs = [(count, tz) for tz, count in count_dict.items()] # Converts dictionary into a list of tuples, i.e. {'aaa': 1, 'bbb': 12, 'ccc': 4} into [(1, 'aaa'), (12, 'bbb'), (4, 'ccc')]
    value_key_pairs.sort() # Sorts the list. Default comparison function applied to tuples compares first elements first, and only if they are equal looks at second elements.
    return value_key_pairs[-n:] # Returns the slice of the sorted array that has last n elements.

我不明白这个简单的循环

“

”

“

”

6 个答案: