通过字符串迭代频率

时间:2015-04-16 22:33:43

标签: python iteration

我需要通过将10个随机字符串写入文本文件然后将其读回并查找每个字符串中每个字母的频率来迭代我已建立的字符串。我知道它重新读取文件,但我无法弄清楚如何找到字符的值。有什么帮助吗?

import string
import random
from collections import Counter

print "******************************"

print "********* EXERCISE 5 *********"

print "******************************"

print "\n**** BEGIN RANDOM STRING *****\n"


def random_string_generator():
    size = random.randint(20, 80)
    return "".join(random.choice(string.ascii_lowercase + string.ascii_uppercase)
                   for _ in range(size))

def main():
    with open("exercise_five.dat", "w+") as f:
        for x in range(0, 10):
            data = random_string_generator()
            f.write(data + "\n")
        f.close()
    with open("exercise_five.dat", 'r') as f:
        count = 0
        c = Counter()
        for i in f:
            print i
        print "Count: %i" % count

if __name__ == '__main__': main()
print "*******************************"

最终输出应该如下所示:

***** BEGIN RANDOM STRING *****
xGYMSlMHGQAMNrSzXWqphkGntMpyjMoHyRDzaNOcmVtoeAZzcV
A ==> 2 D ==> 1 G ==> 3 H ==> 2 M ==> 5 O ==> 1 N ==> 2 Q ==> 1 S ==> 2
R ==> 1 W ==> 1 V ==> 2 Y ==> 1 X ==> 1 Z ==> 1 a ==> 1 c ==> 2 e ==> 1
h ==> 1 k ==> 1 j ==> 1 m ==> 1 l ==> 1 o ==> 2 n ==> 1 q ==> 1 p ==> 2
r ==> 1 t ==> 2 y ==> 2 x ==> 1 z ==> 3
*******************************

我的代码输出现在看起来像这样:

**** BEGIN RANDOM STRING *****

QheDRPpVwDnfYWYMJQwEedJsjApRVafvMYUYuepYSerkoMgCTnHLSHwCitBr

zOFvifcwkrwXLxTrodqkxNxWVHdHDJZbYlcYjAUKz

DRgFXVkbtwpRfXPjzJmXYW

mpkVgUyvHEHAKUWpMZBYIKenicfdcBhxlqCZHFgxoFEmJjtrPykCzvQnFkTHfVthII

zEXLmudQVlpVQYexAvGFTBeUuZvqTO

KSRcpBlfNwcMoNViHFhS

QhTiBLuGCsClezAiVFYODiJXAQCQjwnBnHjWqlsZlljA

iYHznFLFeKwLtynubHTRtGGwjACdGlCpZSQcqnTSWVmufpHQRkwWYiajarnqNuzUzSC

NWlGeJFFcYwacXuUHWqmzSJmsrnWRvpmdSesXXmECuvAMkxGYpHv

WVAAiDgGaGnovCbbdazNGmWXARgdSfqCSztsNTPBdLumIXiDh

*******************************

2 个答案:

答案 0 :(得分:2)

从您定义Counter的位置开始,您可以使用从文件中读取的每一行初始化Counter。这将为您提供一个Counter实例,其中包含键和值,类似于字典:

with open("exercise_five.dat", 'r') as f:
    for line in f:
        c = Counter(line)
        print(' '.join('{} ==> {}'.format(key, val) for key, val in c.items()))

对最后一行进行更深入的解释:

>>> c = Counter("text")  # initialize a Counter object with the string "text"
>>> c.keys()  # this instance has `keys` and `values`, similar to a dictionary
dict_keys(['e', 't', 'x'])
>>> c.items()  # you can access both keys and values at the same time with `items`
dict_items([('e', 1), ('t', 2), ('x', 1)])
>>> c
Counter({'t': 2, 'e': 1, 'x': 1})
>>> for key, val in c.items():
...     print(key, val)
... 
e 1
t 2
x 1

此时,您只需要使用一些字符串格式来获取所需的输出格式,这就是print(' '.join(...)构造所做的。

答案 1 :(得分:1)

defaultdict是一个很好的工具:

import collections

occurrences = collections.defaultdict(int)

word = 'ASDqasdqASD'

for c in word:
    occurrences[c] += 1
print occurrences
> defaultdict(<type 'int'>, {'A': 2, 'a': 1, 'D': 2, 's': 1, 'q': 2, 'S': 2, 'd': 1})