Question

以下过程代码段计算文本字符串的字符频率并写入字典中。字典以字符作为键，频率作为值。

text = "asampletextstring"
char_count = {}
for char in text:
    if char_count.get(char):
        char_count[char] += 1
    else:
        char_count[char] = 1

我的问题是，是否可以将以上代码片段重写为comprehension？

Answer 1

可能，但效率低：

text = "asampletextstring"

char_count = { char : text.count(char) for char in text }

print(char_count)

输出

{'s': 2, 'x': 1, 'p': 1, 'm': 1, 'e': 2, 'r': 1, 'n': 1, 'g': 1, 'a': 2, 'i': 1, 'l': 1, 't': 3}

您可以编写较短版本的代码：

char_count = {}
for char in text:
    char_count[char] = char_count.get(char, 0) + 1

Answer 2

可以在此处使用set()以避免两次或多次遇到字符。

text = "asampletextstring"
dict1 = {ch: text.count(ch) for ch in set(text)}

print(dict1)
{'s': 2, 'r': 1, 'i': 1, 'n': 1, 'a': 2, 'e': 2, 'p': 1, 't': 3, 'x': 1, 'l': 1, 'g': 1, 'm': 1}

Answer 3

每次我使用字典理解，通过将输入转换为集合的字典理解以及传统的for循环进行一些分析时，都会好奇地研究各种方法的性能并证明理解不是很好。理解为什么这里的理解很昂贵，因为.count()每次都要遍历整个text来计算单个char

的频率

from timeit import timeit

print('Approach 1 without set compehrension: {}'.format(timeit ('{ch: text.count(ch) for ch in text}',setup='text = "asampletextstring"',number=1000000)))
print('Approach 2 with set compehrension: {}'.format(timeit ('{ch: text.count(ch) for ch in set(text)}',setup='text = "asampletextstring"',number=1000000)))
print('Approach 3 simple loops :{}'.format(timeit('for c in text:char_count[c] = char_count.get(c, 0) + 1',setup='text = "asampletextstring";char_count={};',number=1000000)))
print('Approach 4 Counter :{}'.format(timeit('Counter(text)',setup='text = "asampletextstring";from collections import Counter;',number=1000000)))

输出：

Approach 1 without set compehrension: 4.43441867505
Approach 2 with set compehrension: 3.98101747791
Approach 3 simple loops :2.60219633984
Approach 4 Counter :7.54261124884

Answer 4

重写-并非如此，我看不出任何简单的方法。我到的最好的东西需要一本额外的字典。

d = {}
{ c: d.get(c, 0)  for c in text if d.update( {c: d.get(c,0) + 1} ) or True}

一个人将能够在Python 3.8中获得一线，但只需（ab）使用赋值表达式

将字符串的char频率重写为理解

4 个答案: