用Python计算字母频率

时间:2015-08-31 04:32:20

标签: python frequency frequency-analysis word-frequency

我需要定义一个函数,它将根据某个字符对字符串进行切片,将这些索引相加,除以字符在字符串中出现的次数,然后将所有字符除以文本的长度。< / p>

这是我到目前为止所拥有的:

def ave_index(char):
  passage = "string"
  if char in passage:
    word = passage.split(char)
    words = len(word)
    number = passage.count(char)
    answer = word / number / len(passage)
    return(answer)

  elif char not in passage:
    return False

到目前为止,我在执行此操作时得到的答案已经完全脱离了标记

编辑:我们被用作字符串的段落 - &#39;叫我以实玛利。几年前 - 没关系多长时间 - 我的钱包里没有钱,也没有什么特别令我感兴趣的,我觉得我会稍微航行一下,看看这个世界的水域。这是我驱除脾脏和调节血液循环的一种方式。每当我发现自己的嘴巴变得严峻时;无论什么时候,我的灵魂都是一个潮湿,细雨的十一月;每当我发现自己在棺材仓库前不由自主地停顿,并抬起我遇到的每一次葬礼的后方;特别是当我的hypos得到我这样的优势时,它需要一个强有力的道德原则来防止我故意走进街道,并有条不紊地敲掉别人的帽子 - 然后,我认为这是时候了我尽快去海边。这是我用手枪和球的替代品。随着哲学的蓬勃发展,卡托将自己扔在剑上;我悄悄带上了船。这没什么好奇怪的。如果他们知道这一点,几乎所有男性,无论是时间还是其他人,都会非常珍惜与我在海洋中的相同感受。&#39;

当char =&#39; s&#39;答案应该是0.5809489252885479

2 个答案:

答案 0 :(得分:2)

您可以使用Counter检查频率:

from collections import Counter
words = 'The passage we were given to use as a string - Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people\'s hats off - then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.'

freqs = Counter(list(words)) # list(words) returns a list of all the characters in words, then Counter will calculate the frequencies 
print(float(freqs['s']) / len(words)) 

答案 1 :(得分:1)

问题在于你如何统计这些字母。取字符串hello world,您正在尝试计算有多少l。现在我们知道有3个l,但如果你进行拆分:

>>> s.split('l')
['he', '', 'o wor', 'd']

这将导致计数为4.此外,我们必须得到字符串中每个字符实例的位置

内置的enumerate可以帮助我们:

>>> s = 'hello world'
>>> c = 'l'  # The letter we are looking for
>>> results = [k for k,v in enumerate(s) if v == c]
>>> results
[2, 3, 9]

现在我们有总出现次数len(results),以及字母出现的字符串中的位置。

这个问题的最后一个“技巧”是确保你用浮点数除以得到正确的结果。

反对您的示例文本(存储在s中):

>>> c = 's'
>>> results = [k for k,v in enumerate(s) if v == c]
>>> results_sum = sum(results)
>>> (results_sum / len(results)) / float(len(s))
0.5804132973944295