字符串中最常见的字符

时间:2016-02-24 06:24:49

标签: python algorithm python-3.x

  

编写一个函数,该函数采用由字母组成的字符串   字符作为输入参数并返回最常见的字符。   忽略空格,即不将任何空格视为字符。   注意,大写在这里并不重要,即小写   字符等于大写字符。如果之间有平局   某些字符返回计数最多的最后一个字符

这是更新后的代码

def most_common_character (input_str):
    input_str = input_str.lower()
    new_string = "".join(input_str.split())
    print(new_string)
    length = len(new_string)
    print(length)
    count = 1
    j = 0
    higher_count = 0
    return_character = ""
    for i in range(0, len(new_string)):
        character = new_string[i]
        while (length - 1):
            if (character == new_string[j + 1]):
                count += 1
            j += 1
            length -= 1    
            if (higher_count < count):
                higher_count = count
    return (character)     

#Main Program
input_str = input("Enter a string: ")
result = most_common_character(input_str)
print(result)

以上是我的代码。我收到string index out of bound的错误,我无法理解为什么。此外,代码只检查第一个字符的出现我很困惑如何继续下一个字符并采取最大计数?

运行代码时出现的错误:

> Your answer is NOT CORRECT Your code was tested with different inputs.
> For example when your function is called as shown below:
> 
> most_common_character ('The cosmos is infinite')
> 
> ############# Your function returns ############# e The returned variable type is: type 'str'
> 
> ######### Correct return value should be ######## i The returned variable type is: type 'str'
> 
> ####### Output of student print statements ###### thecosmosisinfinite 19

3 个答案:

答案 0 :(得分:4)

您可以使用正则表达式模式搜索所有字符。 \w匹配任何字母数字字符和下划线;这相当于集合[a-zA-Z0-9_]+之后的[\w]表示匹配一个或多个重复。

最后,您使用Counter来计算它们,并使用most_common(1)来获得最高值。关于平局的情况见下文。

from collections import Counter
import re

s = "Write a function that takes a string consisting of alphabetic characters as input argument and returns the most common character. Ignore white spaces i.e. Do not count any white space as a character. Note that capitalization does not matter here i.e. that a lower case character is equal to a upper case character. In case of a tie between certain characters return the last character that has the most count"

>>> Counter(c.lower() for c in re.findall(r"\w", s)).most_common(1)
[('t', 46)]

在平局的情况下,这有点棘手。

def top_character(some_string):
    joined_characters = [c for c in re.findall(r"\w+", some_string.lower())]
    d = Counter(joined_characters)
    top_characters = [c for c, n in d.most_common() if n == max(d.values())]
    if len(top_characters) == 1:
        return top_characters[0]
    reversed_characters = joined_characters[::-1]  
    for c in reversed_characters:
        if c in top_characters:
            return c

>>> top_character(s)
't'

>>> top_character('the the')
'e'

对于上面的代码和你的句子&#34;宇宙是无限的&#34;,你可以看到&#39; i&#39;发生频率更高的“... (函数的输出):

>>> Counter(c.lower() for c in "".join(re.findall(r"[\w]+", 'The cosmos is infinite'))).most_common(3)
[('i', 4), ('s', 3), ('e', 2)]

您可以在代码块中看到问题:

for i in range(0, len(new_string)):
    character = new_string[i]
    ...
return (character)     

您正在迭代一个句子并将该字母分配给变量字符,该字符永远不会在其他地方重新分配。因此,变量character将始终返回字符串中的最后一个字符。

答案 1 :(得分:2)

实际上你的代码几乎正确。您需要在count内移动jlengthfor i in range(0, len(new_string)),因为您需要重新开始每次迭代,并且count大于higher_count {1}},您需要将charater保存为return_character并返回character而不是character = new_string[i],因为j+1,它始终是字符串的最后一个字符。

我不明白您为何使用while length-1def most_common_character (input_str): input_str = input_str.lower() new_string = "".join(input_str.split()) higher_count = 0 return_character = "" for i in range(0, len(new_string)): count = 0 length = len(new_string) j = 0 character = new_string[i] while length > 0: if (character == new_string[j]): count += 1 j += 1 length -= 1 if (higher_count <= count): higher_count = count return_character = character return (return_character) 。在纠正它们之后,它现在也涵盖了领带情况。

{{1}}

答案 2 :(得分:1)

如果我们忽略“领带”要求; collections.Counter()有效:

from collections import Counter
from itertools import chain

def most_common_character(input_str):
    return Counter(chain(*input_str.casefold().split())).most_common(1)[0][0]

示例:

>>> most_common_character('The cosmos is infinite')
'i'
>>> most_common_character('ab' * 3)
'a'

要返回计数最多的最后一个字符,我们可以使用collections.OrderedDict

from collections import Counter, OrderedDict
from itertools import chain
from operator import itemgetter

class OrderedCounter(Counter, OrderedDict):
    pass

def most_common_character(input_str):
    counter = OrderedCounter(chain(*input_str.casefold().split()))
    return max(reversed(counter.items()), key=itemgetter(1))[0]

示例:

>>> most_common_character('ab' * 3)
'b'

注意:此解决方案假定max()返回计数最多的第一个字符(因此有一个reversed() call,以获取最后一个)并且所有字符都是单个Unicode代码点。通常,您可能希望使用\X正则表达式(由regex module支持)从Unicode字符串中提取"user-perceived characters"eXtended grapheme cluster)。