如何反转索引计数器并将其转换为defaultdict?蟒蛇

时间:2016-10-25 09:03:40

标签: python dictionary counter counting defaultdict

给出一个计数器,例如:

>>> from collections import Counter
>>> Counter('123112415121361273')
Counter({'1': 7, '2': 4, '3': 3, '5': 1, '4': 1, '7': 1, '6': 1})

如何反转索引并将计数作为键和值作为原始字符串键列表?

目的是将上面的例子转换成这样的东西:

defaultdict(<type 'list'>, {1: ['5', '4', '7', '6'], 3: ['3'], 4: ['2'], 7: ['1']})

我尝试通过Counter尝试手动重复:

>>> from collections import Counter
>>> Counter('123112415121361273')
Counter({'1': 7, '2': 4, '3': 3, '5': 1, '4': 1, '7': 1, '6': 1})
>>> x = Counter('123112415121361273')
>>> from collections import Counter, defaultdict
>>> y = defaultdict(list)
>>> for s, count in x.items():
...     y[count].append(s)
... 
>>> y
defaultdict(<type 'list'>, {1: ['5', '4', '7', '6'], 3: ['3'], 4: ['2'], 7: ['1']})

但还有其他方法吗?

由于输入是字符串'123112415121361273'并且输出应该是由计数索引的字典,是否有任何方法可以避免计数步骤在第一次迭代它时到达结果是defaultdict?

2 个答案:

答案 0 :(得分:1)

不,没有更有效的方式。

计算最好使用映射,这正是Counter 所做的。由于在完全遍历字符串之前,您不知道任何字符的最终计数,因此在完成计数之前,您无法预先知道要将字符存入哪个存储桶。

因此,无效替代方案是从计数到字符的映射开始,然后将字符移动到下一个存储桶,因为您发现它们已经有计数。发现它们已经有计数需要你对每个桶进行测试,这样就成了O(NK)解决方案,而不是Counter给你的直接线性O(N)解决方案。

## Warning: this is not an efficient approach; use for illustration purposes only

from collections import defaultdict

s = '123112415121361273'
count_to_char = defaultdict(set)  # use a set to avoid O(N**2) performance
max_count = 0
for char in s:  # loop over N items
    for i in range(1, max_count + 1):  # loop over up to K buckets
        if char in count_to_char[i]:
            count_to_char[i].remove(char)
            count_to_char[i + 1].add(char)
            break
    else:
        i = 0
        count_to_char[1].add(char)
    max_count = max(i + 1, max_count)
# remove empty buckets again
for count in [c for c, b in count_to_char.items() if not b]:
    del count_to_char[count]
# alternative method to clear empty buckets, producing a regular dict
# count_to_char = {c: b for c, b in count_to_char.items() if b}

避免对K-bucket进行扫描的方法是使用已经使用过的计数器。

答案 1 :(得分:1)

from timeit import timeit
from random import choice
from collections import Counter, defaultdict
from string import printable

def str_count(input_num, defaultdict=defaultdict):

    d = defaultdict(list)
    for count, s in map(lambda x: (input_num.count(x), x), set(input_num)):
        d[count].append(s)
    return d

def counter(input_num, defaultdict=defaultdict, Counter=Counter):
    x = Counter(input_num)
    y = defaultdict(list)
    for s, count in x.items():
        y[count].append(s)
    return y

def pieters_default_dict(input_num, defaultdict=defaultdict):
    x = defaultdict(int)
    for c in input_num:
        x[c] += 1
    y = defaultdict(list)
    for s, count in x.items():
        y[count].append(s)
    return y

def pieters_buckets(input_num, defaultdict=defaultdict):
    ## Warning: this is not an efficient approach; use for illustration purposes only
    count_to_char = defaultdict(set)  # use a set to avoid O(N**2) performance
    max_count = 0
    for char in input_num:  # loop over N items
        for i in range(1, max_count + 1):  # loop over up to K buckets
            if char in count_to_char[i]:
                count_to_char[i].remove(char)
                count_to_char[i + 1].add(char)
                break
        else:
            i = 0
            count_to_char[1].add(char)
        max_count = max(i + 1, max_count)
    # remove empty buckets again
    for count in [c for c, b in count_to_char.items() if not b]:
        del count_to_char[count]
    return count_to_char

test = ''.join([choice(printable) for _ in range(1000)])
number = 100

print('str_count:                 ', timeit('f(t)', 'from __main__ import str_count as f, test as t', number=number))
print('pieters_default_dict:      ', timeit('f(t)', 'from __main__ import pieters_default_dict as f, test as t', number=number))
print('Counter:                   ', timeit('f(t)', 'from __main__ import counter as f, test as t', number=number))
print('pieters_buckets:           ', timeit('f(t)', 'from __main__ import pieters_buckets as f, test as t', number=number))

Timeit with Python 2.7.12和iteritems()返回:

pieters_default_dict:       0.013843059539794922
str_count:                  0.016570091247558594
Counter:                    0.030740022659301758
pieters_buckets:            0.1262810230255127

在Python 3.5.2和items()上:

Counter:                    0.00771436400100356
pieters_default_dict:       0.013124741999490652
str_count:                  0.017287666001720936
pieters_buckets:            0.11816959099996893

更新

  • 感谢Martijn Pieters,从str_count()函数中消除了不必要的循环。
  • 此脚本在Python 2.7.12上进行了测试

更新2

  • 从功能中删除导入,跟进评论。
  • 修正了从问题中复制的代码,因为正在创建两个计数器。

更新3

  • 将时间结果添加到Pieters详细说明的另外两个版本的脚本中。
  • 在Python 3.5.2上添加了timeit结果
  • pieters_buckets()方法仅用于说明目的,因此只是为了好奇而定时。

更新4

  • 修正了使用随机字符串和更多重复次数的代码。