列表中每个单词的平均字符数

时间:2014-02-25 16:49:54

标签: python regex string python-3.x

我是python的新手,我需要计算列表中每个单词的平均字符数

使用这些定义和辅助函数clean_up

令牌是一种str,它是通过调用文件行上的字符串方法来获得的。

单词是文件中的非空标记,并非完全由标点符号组成。 使用str.split找到令牌,然后使用辅助函数clean_up从单词中删除标点符号,找到文件中的“字词”。

句子是由字符!?.或文件末尾终止(但不包括)的字符序列,不包括空格要么结束,要么不是空的。

这是我大学计算机科学课上的作业问题

清理功能是:

def clean_up(s):
    punctuation = """!"',;:.-?)([]<>*#\n\"""
    result = s.lower().strip(punctuation)
    return result

我的代码是:

def average_word_length(text):
    """ (list of str) -> float

    Precondition: text is non-empty. Each str in text ends with \n and at
    least one str in text contains more than just \n.

    Return the average length of all words in text. Surrounding punctuation
    is not counted as part of the words. 


    >>> text = ['James Fennimore Cooper\n', 'Peter, Paul and Mary\n']
    >>> average_word_length(text)
    5.142857142857143 
    """

    for ch in text:
        word = ch.split()
        clean = clean_up(ch)
        average = len(clean) / len(word)
    return average

我得到5.0,但我真的很困惑,一些帮助将不胜感激:) PS我正在使用python 3

2 个答案:

答案 0 :(得分:6)

让我们用导入和生成器表达式清理其中一些函数,不管吗?

import string

def clean_up(s):
    # I'm assuming you REQUIRE this function as per your assignment
    # otherwise, just substitute str.strip(string.punctuation) anywhere
    # you'd otherwise call clean_up(str)
    return s.strip(string.punctuation)

def average_word_length(text):
    total_length = sum(len(clean_up(word)) for sentence in text for word in sentence.split())
    num_words = sum(len(sentence.split()) for sentence in text)
    return total_length/num_words

您可能会注意到这实际上会缩短为一个长度且不可读的单行:

average = sum(len(word.strip(string.punctuation)) for sentence in text for word in sentence.split()) / sum(len(sentence.split()) for sentence in text)

这是非常令人作呕的,这就是你不应该这样做的原因;)。可读性和所有这些。

答案 1 :(得分:5)

这是解决您仍然可读的问题的简短方法。

def clean_up(word, punctuation="!\"',;:.-?)([]<>*#\n\\"):
    return word.lower().strip(punctuation)  # you don't really need ".lower()"

def average_word_length(text):
    cleaned_words = [clean_up(w) for w in (w for l in text for w in l.split())]
    return sum(map(len, cleaned_words))/len(cleaned_words)  # Python2 use float

>>> average_word_length(['James Fennimore Cooper\n', 'Peter, Paul and Mary\n'])
5.142857142857143

所有这些先决条件的负担都归于你。