Question

我需要创建一个函数，它将文本文件作为输入，并返回一个大小为26的向量，其频率为每个字符的百分比（a到z）。这必须对案件不敏感。所有其他字母（例如å）和符号都应该被忽略。

我试图使用这里的一些答案，尤其是'雅各布'的答案。 Determining Letter Frequency Of Cipher Text

到目前为止，这是我的代码：

def letterFrequency(filename):
    #f: the text file is converted to lowercase 
    f=filename.lower()
    #n: the sum of the letters in the text file
    n=float(len(f))
    import collections
    dic=collections.defaultdict(int)
    #the absolute frequencies
    for x in f:
        dic[x]+=1
    #the relative frequencies
    from string import ascii_lowercase
    for x in ascii_lowercase:
        return x,(dic[x]/n)*100

例如，如果我试试这个：

print(letterFrequency('I have no idea'))
>>> ('a',14.285714)

为什么不打印字母的所有相对值？还有不在字符串中的字母，比如我的例子中的z？

如何让我的代码打印出大小为26的矢量？

编辑：我尝试过使用Counter，但它打印（'a'：14.2857）和字母混合顺序。我只需要有序序列中字母的相对频率！

Answer 1

for x in ascii_lowercase:
    return x,(dic[x]/n)*100

该函数在循环的第一次迭代中返回。

相反，更改它以返回元组列表：

letters = []
for x in ascii_lowercase:
    letters.append((x,(dic[x]/n)*100))
return letters

Answer 2

问题在于for循环：

for x in ascii_lowercase:
    return x,(dic[x]/n)*100

你返回一个元组，所以它会在第一次迭代时停止。

使用yield代替return，这将变成按预期工作的生成器。

使其工作的另一种方法是返回列表理解：

return [x,(dic[x]/n)*100 for x in ascii_lowercase]

但是，如果您的目的是计算项目，我建议使用Counter类：

def letterFrequency(txt):
    from collections import Counter
    from string import ascii_lowercase
    c=Counter(txt.lower())
    n=len(txt)/100.
    return [(x, c[x]/n) for x in ascii_lowercase]

如您所见，c=Counter(txt.lower())完成了通过字符迭代和保持计数的所有工作。计数器的行为就像defaultdict。

请注意，Counter也有很好的usefult方法，例如c.most_common() ...

Answer 3

嘿！这是您的解决方案。我用了脚本你问的问题作为例子 :)（我所做的只是制作一个字符串，称为字符串）

string = ' your text '
    alpha_count = {}
    string = string.lower()
    for alpha in string:
        match = re.search("[a-z]", alpha)
        boolean = bool(match)
        if boolean:
            if alpha in alpha_count:
                alpha_count[alpha] += 1
            else:
                alpha_count[alpha] = 1
    print(alpha_count)

输出：

{'i': 18,
 'n': 18,
 'e': 47,
 'd': 7,
 't': 32,
 'o': 18,
 'c': 13,
 'r': 21,
 'a': 19,
 'f': 10,
 'u': 8,
 'h': 12,
 'k': 1,
 's': 19,
 'x': 3,
 'l': 9,
 'p': 4,
 'v': 3,
 'z': 2,
 'w': 3,
 'q': 2,
 'y': 4,
 'm': 6,
 'b': 4,
 'g': 2,
 'j': 1}

如果你想有一个排序的视图，在执行上面给出的代码后使用这个代码。

alpha_items = alpha_count.items()
sorted_items = sorted(alpha_items)
print(sorted_items)

结果：

[('a', 19), ('b', 4), ('c', 13), ('d', 7), ('e', 47), ('f', 10), ('g', 2), ('h', 12), ('i', 18), ('j', 1), ('k', 1), ('l', 9), ('m', 6), ('n', 18), ('o', 18), ('p', 4), ('q', 2), ('r', 21), ('s', 19), ('t', 32), ('u', 8), ('v', 3), ('w', 3), ('x', 3), ('y', 4), ('z', 2)]

谢谢！！！

确定相对字母频率

3 个答案: