给出一个计数器,例如:
>>> from collections import Counter
>>> Counter('123112415121361273')
Counter({'1': 7, '2': 4, '3': 3, '5': 1, '4': 1, '7': 1, '6': 1})
如何反转索引并将计数作为键和值作为原始字符串键列表?
目的是将上面的例子转换成这样的东西:
defaultdict(<type 'list'>, {1: ['5', '4', '7', '6'], 3: ['3'], 4: ['2'], 7: ['1']})
我尝试通过Counter
尝试手动重复:
>>> from collections import Counter
>>> Counter('123112415121361273')
Counter({'1': 7, '2': 4, '3': 3, '5': 1, '4': 1, '7': 1, '6': 1})
>>> x = Counter('123112415121361273')
>>> from collections import Counter, defaultdict
>>> y = defaultdict(list)
>>> for s, count in x.items():
... y[count].append(s)
...
>>> y
defaultdict(<type 'list'>, {1: ['5', '4', '7', '6'], 3: ['3'], 4: ['2'], 7: ['1']})
但还有其他方法吗?
由于输入是字符串'123112415121361273'
并且输出应该是由计数索引的字典,是否有任何方法可以避免计数步骤在第一次迭代它时到达结果是defaultdict?
答案 0 :(得分:1)
不,没有更有效的方式。
计算最好使用映射,这正是Counter
所做的。由于在完全遍历字符串之前,您不知道任何字符的最终计数,因此在完成计数之前,您无法预先知道要将字符存入哪个存储桶。
因此,无效替代方案是从计数到字符的映射开始,然后将字符移动到下一个存储桶,因为您发现它们已经有计数。发现它们已经有计数需要你对每个桶进行测试,这样就成了O(NK)解决方案,而不是Counter
给你的直接线性O(N)解决方案。
## Warning: this is not an efficient approach; use for illustration purposes only
from collections import defaultdict
s = '123112415121361273'
count_to_char = defaultdict(set) # use a set to avoid O(N**2) performance
max_count = 0
for char in s: # loop over N items
for i in range(1, max_count + 1): # loop over up to K buckets
if char in count_to_char[i]:
count_to_char[i].remove(char)
count_to_char[i + 1].add(char)
break
else:
i = 0
count_to_char[1].add(char)
max_count = max(i + 1, max_count)
# remove empty buckets again
for count in [c for c, b in count_to_char.items() if not b]:
del count_to_char[count]
# alternative method to clear empty buckets, producing a regular dict
# count_to_char = {c: b for c, b in count_to_char.items() if b}
避免对K-bucket进行扫描的方法是使用已经使用过的计数器。
答案 1 :(得分:1)
from timeit import timeit
from random import choice
from collections import Counter, defaultdict
from string import printable
def str_count(input_num, defaultdict=defaultdict):
d = defaultdict(list)
for count, s in map(lambda x: (input_num.count(x), x), set(input_num)):
d[count].append(s)
return d
def counter(input_num, defaultdict=defaultdict, Counter=Counter):
x = Counter(input_num)
y = defaultdict(list)
for s, count in x.items():
y[count].append(s)
return y
def pieters_default_dict(input_num, defaultdict=defaultdict):
x = defaultdict(int)
for c in input_num:
x[c] += 1
y = defaultdict(list)
for s, count in x.items():
y[count].append(s)
return y
def pieters_buckets(input_num, defaultdict=defaultdict):
## Warning: this is not an efficient approach; use for illustration purposes only
count_to_char = defaultdict(set) # use a set to avoid O(N**2) performance
max_count = 0
for char in input_num: # loop over N items
for i in range(1, max_count + 1): # loop over up to K buckets
if char in count_to_char[i]:
count_to_char[i].remove(char)
count_to_char[i + 1].add(char)
break
else:
i = 0
count_to_char[1].add(char)
max_count = max(i + 1, max_count)
# remove empty buckets again
for count in [c for c, b in count_to_char.items() if not b]:
del count_to_char[count]
return count_to_char
test = ''.join([choice(printable) for _ in range(1000)])
number = 100
print('str_count: ', timeit('f(t)', 'from __main__ import str_count as f, test as t', number=number))
print('pieters_default_dict: ', timeit('f(t)', 'from __main__ import pieters_default_dict as f, test as t', number=number))
print('Counter: ', timeit('f(t)', 'from __main__ import counter as f, test as t', number=number))
print('pieters_buckets: ', timeit('f(t)', 'from __main__ import pieters_buckets as f, test as t', number=number))
Timeit with Python 2.7.12和iteritems()返回:
pieters_default_dict: 0.013843059539794922
str_count: 0.016570091247558594
Counter: 0.030740022659301758
pieters_buckets: 0.1262810230255127
在Python 3.5.2和items()上:
Counter: 0.00771436400100356
pieters_default_dict: 0.013124741999490652
str_count: 0.017287666001720936
pieters_buckets: 0.11816959099996893
更新
更新2
更新3
更新4