获得一个句子中的字母频率

时间:2018-05-19 18:01:13

标签: python list count frequency

我正在尝试创建一个代码,我可以输入一个随机句子,并计算一个字母在此字符串中返回的次数:

def getfreq(lines):
    """ calculate a list with letter frequencies

    lines - list of lines (character strings)

    both lower and upper case characters are counted.
    """
    totals = 26*[0]
    chars = []
    for line in lines:
       for ch in line:
           chars.append(totals)

    return totals

    # convert totals to frequency
    freqlst = []
    grandtotal = sum(totals)

    for total in totals:
        freq = totals.count(chars)
        freqlst.append(freq)
    return freqlst

到目前为止,我已经实现了在列表中添加输入的每个字母(字符)。但现在我需要一种方法来计算一个字符在该列表中返回的次数,并以频率表示。

3 个答案:

答案 0 :(得分:1)

collections模块中有一个非常方便的函数Counter,它将计算序列中对象的频率:

import collections
collections.Counter('A long sentence may contain repeated letters')

将产生:

Counter({' ': 6,
         'A': 1,
         'a': 3,
         'c': 2,
         'd': 1,
         'e': 8,
         'g': 1,
         'i': 1,
         'l': 2,
         'm': 1,
         'n': 5,
         'o': 2,
         'p': 1,
         'r': 2,
         's': 2,
         't': 5,
         'y': 1})

在您的情况下,您可能希望连接您的行,例如在进入''.join(lines)之前使用Counter

如果您想使用原始词典获得类似的结果,您可能希望执行以下操作:

counts = {}
for c in my_string:
    counts[c] = counts.get(c, 0) + 1

根据您的Python版本,这可能会更慢,但使用.get()的{​​{1}}方法返回现有计数或默认值,然后递增字符串中每个字符的计数

答案 1 :(得分:1)

没有collections.Counter

import collections

sentence = "A long sentence may contain repeated letters"

count = collections.defaultdict(int)  # save some time with a dictionary factory
for letter in sentence:  # iterate over each character in the sentence
    count[letter] += 1  # increase count for each of the sentences

或者如果您真的想完全手动完成:

sentence = "A long sentence may contain repeated letters"

count = {}  # a counting dictionary
for letter in sentence:  # iterate over each character in the sentence
    count[letter] = count.get(letter, 0) + 1  # get the current value and increase by 1

在这两种情况下,count字典都会将每个不同的字母作为其键,其值将是遇到字母的次数,例如:

print(count["e"])  # 8

如果您想让它不区分大小写,请务必在将其添加到计数时调用letter.lower()

答案 2 :(得分:0)

您可以使用一组将文本缩小为唯一字符,然后只计算:

text = ' '.join(lines)  # Create one long string
# Then create a set of all unique characters in the text
characters = {char for char in text if char.isalpha()}
statistics = {}         # Create a dictionary to hold the results
for char in characters: # Loop through unique characters
    statistics[char] = text.count(char) # and count them