计算字符串中char出现次数的最佳方法

时间:2012-01-21 09:55:54

标签: python string optimization performancecounter

您好我试图在一行中编写这些python行,但由于代码正在进行字典修改而导致出现一些错误。

for i in range(len(string)):
    if string[i] in dict:
        dict[string[i]] += 1

我认为的一般语法是

abc = [i for i in len(x) if x[i] in array]

考虑到我在字典中为值添加1,有人可以告诉我这是如何工作的吗

由于

6 个答案:

答案 0 :(得分:7)

您尝试执行的操作可以通过 dict 生成器表达式str.count()完成:

abc = dict((c, string.count(c)) for c in string)

替代使用set(string) (来自soulcheck下面的评论)

abc = dict((c, string.count(c)) for c in set(string))

时序

看到下面的评论我在这个和其他答案中进行了一些测试。 (使用python-3.2)

测试功能:

@time_me
def test_dict(string, iterations):
    """dict((c, string.count(c)) for c in string)"""
    for i in range(iterations):
        dict((c, string.count(c)) for c in string)

@time_me
def test_set(string, iterations):
    """dict((c, string.count(c)) for c in set(string))"""
    for i in range(iterations):
        dict((c, string.count(c)) for c in set(string))

@time_me
def test_counter(string, iterations):
    """Counter(string)"""
    for i in range(iterations):
        Counter(string)

@time_me
def test_for(string, iterations, d):
    """for loop from cha0site"""
    for i in range(iterations):
        for c in string:
            if c in d:
                d[c] += 1

@time_me
def test_default_dict(string, iterations):
    """defaultdict from joaquin"""
    for i in range(iterations):
        mydict = defaultdict(int)
        for mychar in string:
            mydict[mychar] += 1

测试执行:

d_ini = dict((c, 0) for c in string.ascii_letters)
words = ['hand', 'marvelous', 'supercalifragilisticexpialidocious']

for word in words:
    print('-- {} --'.format(word))
    test_dict(word, 100000)
    test_set(word, 100000)
    test_counter(word, 100000)
    test_for(word, 100000, d_ini)
    test_default_dict(word, 100000)
    print()

print('-- {} --'.format('Pride and Prejudcie - Chapter 3 '))

test_dict(ch, 1000)
test_set(ch, 1000)
test_counter(ch, 1000)
test_for(ch, 1000, d_ini)
test_default_dict(ch, 1000)

测试结果:

-- hand --
389.091 ms -  dict((c, string.count(c)) for c in string)
438.000 ms -  dict((c, string.count(c)) for c in set(string))
867.069 ms -  Counter(string)
100.204 ms -  for loop from cha0site
241.070 ms -  defaultdict from joaquin

-- marvelous --
654.826 ms -  dict((c, string.count(c)) for c in string)
729.153 ms -  dict((c, string.count(c)) for c in set(string))
1253.767 ms -  Counter(string)
201.406 ms -  for loop from cha0site
460.014 ms -  defaultdict from joaquin

-- supercalifragilisticexpialidocious --
1900.594 ms -  dict((c, string.count(c)) for c in string)
1104.942 ms -  dict((c, string.count(c)) for c in set(string))
2513.745 ms -  Counter(string)
703.506 ms -  for loop from cha0site
935.503 ms -  defaultdict from joaquin

# !!!: Do not compare this last result with the others because is timed
#      with 1000 iterations instead of 100000
-- Pride and Prejudcie - Chapter 3  --
155315.108 ms -  dict((c, string.count(c)) for c in string)
982.582 ms -  dict((c, string.count(c)) for c in set(string))
4371.579 ms -  Counter(string)
1609.623 ms -  for loop from cha0site
1300.643 ms -  defaultdict from joaquin

答案 1 :(得分:7)

Python 2.7 +的替代方案:

from collections import Counter

abc = Counter('asdfdffa')
print abc
print abc['a']

输出:

Counter({'f': 3, 'a': 2, 'd': 2, 's': 1})
2

答案 2 :(得分:6)

这是收藏模块的工作:


选项1 .- collections. defaultdict

>>> from collections import defaultdict
>>> mydict = defaultdict(int)

然后你的循环变为:

>>> for mychar in mystring: mydict[mychar] += 1

选项2 .- collections.Counter(来自Felix评论):

对于这种特定情况以及来自同一collections模块的更好的替代方案:

>>> from collections import Counter

然后你只需要(!!!):

>>> mydict = Counter(mystring)

计数器仅适用于python 2.7。所以对于python< 2.7你应该使用defaultdict

答案 3 :(得分:1)

这不是列表理解的好候选人。您通常希望使用列表推导来制作列表,并且在其中包含副作用(更改全局状态)并不是一个好主意。

另一方面,你的代码可能会更好:

for c in string:
    if c in dict:
        dict[c] += 1

或者如果你真的想要实现功能(我已将dict重命名为d因为我需要python的内置dict函数):

d.update(dict([ (c, d[c]+1, ) for c in string ]))

请注意我在列表理解中如何更改d,而是在其外部更新d

答案 4 :(得分:-1)

你原来的循环是绝望的非战斗。如果你想要的只是迭代range(len(string))中的字母,则无需迭代string。这样做:

for c in my_string:
    if c in my_dict:
        my_dict[c] += 1

答案 5 :(得分:-1)

>>> def count(s):
    global k
    list =[]
    for i in s:
        k=0
        if i not in list:
            list.append(i)      
            for j in range(len(s)):
                if i == s[j]:
                    k +=1

            print 'count of char {0}:{1}'.format(i,k)


>>> count('masterofalgorithm')
count of char m:2
count of char a:2
count of char s:1
count of char t:2
count of char e:1
count of char r:2
count of char o:2
count of char f:1
count of char l:1
count of char g:1
count of char i:1
count of char h:1
>>>