Question

我正在尝试制作一种工具，可以在某种类型的密文中找到字母的频率。让我们假设它全是小写的a-z没有数字。编码的消息在txt文件中

我正在尝试构建一个脚本来帮助破解替换或可能转换密码。

到目前为止

代码：

cipher = open('cipher.txt','U').read()
cipherfilter = cipher.lower()
cipherletters = list(cipherfilter)

alpha = list('abcdefghijklmnopqrstuvwxyz')
occurrences = {} 
for letter in alpha:
    occurrences[letter] = cipherfilter.count(letter)
for letter in occurrences:
    print letter, occurrences[letter]

到目前为止所做的只是显示一封信出现的次数。如何打印此文件中找到的所有字母的频率。

Answer 1

import collections

d = collections.defaultdict(int)
for c in 'test':
    d[c] += 1

print d # defaultdict(<type 'int'>, {'s': 1, 'e': 1, 't': 2})

来自档案：

myfile = open('test.txt')
for line in myfile:
    line = line.rstrip('\n')
    for c in line:
        d[c] += 1

对于defaultdict容器的天才，我们必须表示感谢和赞扬。否则我们都会做这样愚蠢的事情：

s = "andnowforsomethingcompletelydifferent"
d = {}
for letter in s:
    if letter not in d:
        d[letter] = 1
    else:
        d[letter] += 1

Answer 2

现代方式：

from collections import Counter

string = "ihavesometextbutidontmindsharing"
Counter(string)
#>>> Counter({'i': 4, 't': 4, 'e': 3, 'n': 3, 's': 2, 'h': 2, 'm': 2, 'o': 2, 'a': 2, 'd': 2, 'x': 1, 'r': 1, 'u': 1, 'b': 1, 'v': 1, 'g': 1})

Answer 3

如果你想知道字母c的relative frequency，你必须将c的出现次数除以输入的长度。

例如，以Adam为例：

s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37

并将每个字母的绝对频率存储在

中

dict[letter]

我们通过以下方式获得相对频率：

from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
    print c, dict[c]/float(n)

把它们放在一起，我们得到这样的东西：

# get input
s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37

# get absolute frequencies of letters
import collections
dict = collections.defaultdict(int)
for c in s:
    dict[c] += 1

# print relative frequencies
from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
    print c, dict[c]/float(n)

确定密文的字母频率

3 个答案: