Question

完成练习以找到字符串中最常见的字母，不包括标点符号，结果应为小写。因此，在示例"HHHHello World!!!!!!!!!!"中，结果应为"h"。

到目前为止我所拥有的是：

text=input('Insert String: ')
def mwl(text):
    import string
    import collections
    for p in text:
        p.lower()
    for l in string.punctuation:
        for x in text:
            if x==l:
                text.replace(x,'')
    collist=collections.Counter(text).most_common(1)
    print(collist[0][0])

mwl(text)

感谢您帮助理解原因：

案例未在text
标点符号未从text字符串

Answer 1

有几个问题：

字符串是不可变的。这意味着lower()和replace() 等函数会返回结果并保留原始字符串。您需要在某处分配该返回值。
lower()可以对整个字符串进行操作：text = text.lower()。

有关如何从字符串中删除标点字符的一些想法，请参阅Best way to strip punctuation from a string in Python

Answer 2

你可以试试这个：

>>> import re
>>> from collections import Counter
>>> my_string = "HHHHello World!!!!!!!!!!"
>>> Counter("".join(re.findall("[a-z]+",my_string.lower()))).most_common(1)
[('h', 4)]

Answer 3

text = input('Insert String: ')

from string import punctuation
from collections import Counter
def mwl(text):
    st = set(punctuation)
    # remove all punctuation and make every letter lowercase
    filtered = (ch.lower() for ch in text if ch not in st)
    # make counter dict from remaining letters and return the most common
    return Counter(filtered).most_common()[0][0]

或使用str.translate删除标点符号：

from string import punctuation
from collections import Counter
def mwl(text):
    text = text.lower().translate(str.maketrans(" "*len(punctuation),punctuation))
    return Counter(text).most_common()[0][0]

使用您自己的代码，您需要将文本重新分配给更新的字符串：

def mwl(text):
    import string
    import collections
    text = text.lower() 
    for l in string.punctuation:
        for x in text:
            if x == l:
                text = text.replace(x,'')
    collist=collections.Counter(text).most_common(1)
    print(collist[0][0])

而不是循环遍历代码中的文本，您只需使用：

for l in string.punctuation:
     if l in text:
        text = text.replace(l,'')

Answer 4

首要的问题是你从未真正分配任何东西。

 p.lower()

只返回p的小写版本。它没有将p设置为小写版本。应该是

p = p.lower()

与text.replace（x，＆＃39;＆＃39;）相同。它应该是 text = text.replace（x，＆＃39;＆＃39;）

Answer 5

你可以这样做：

>>> from collections import Counter
>>> from string import ascii_letters
>>> tgt="HHHHello World!!!!!!!!!!" 
>>> Counter(c.lower() for c in tgt if c in ascii_letters).most_common(1)
[('h', 4)]

Answer 6

如果输入仅为ascii，那么您可以使用bytes.translate()将其转换为小写并删除标点符号：

#!/usr/bin/env python3
from string import ascii_uppercase, ascii_lowercase, punctuation

table = b''.maketrans(ascii_uppercase.encode(), ascii_lowercase.encode())
def normalize_ascii(text, todelete=punctuation.encode()):
    return text.encode('ascii', 'strict').translate(table, todelete)

s = "HHHHello World!!!!!!!!!!"

count = [0]*256 # number of all possible bytes
for b in normalize_ascii(s): count[b] += 1 # count bytes
# print the most common byte
print(chr(max(range(len(count)), key=count.__getitem__)))

如果你想计算非ascii Unicode文本中的字母，那么你可以使用.casefold() method（正确的无结果比较）和remove_punctuation() function：

#!/usr/bin/env python3
from collections import Counter
import regex # $ pip install regex

def remove_punctuation(text):
    return regex.sub(r"\p{P}+", "", text)

s = "HHHHello World!!!!!!!!!!"
no_punct = remove_punctuation(s)
characters = (c.casefold() for c in regex.findall(r'\X', no_punct))
print(Counter(characters).most_common(1)[0][0])

r'\X'正则表达式用于计算用户感知的字符，而不仅仅是Unicode代码点。

字符串中最常见的字母

6 个答案: