python每个句子的平均短语

时间:2014-03-10 23:10:35

标签: python average sentence phrases

给出了这两个功能。

def split_on_separators(original, separators):
""" (str, str) -> list of str

Return a list of non-empty, non-blank strings from the original string
determined by splitting the string on any of the separators.
separators is a string of single-character separators.

>>> split_on_separators("Hooray! Finally, we're done.", "!,")
['Hooray', ' Finally', " we're done."]
"""

# To do: Complete this function's body to meet its specification.
# You are not required to keep the two lines below but you may find
# them helpful. (Hint)
for i in separators:
    original = original.replace(i,"<*)))>{")
    ret = original.split("<*)))>{")
return ret

def clean_up(s):
""" (str) -> str

Return a new string based on s in which all letters have been
converted to lowercase and punctuation characters have been stripped 
from both ends. Inner punctuation is left untouched. 

>>> clean_up('Happy Birthday!!!')
'happy birthday'
>>> clean_up("-> It's on your left-hand side.")
" it's on your left-hand side"
"""

punctuation = """!"',;:.-?)([]<>*#\n\t\r"""
result = s.lower().strip(punctuation)
return result

我应该返回每个句子的平均短语数。 这是我写的函数

def avg_sentence_complexity(text):
""" (list of str) -> float

Return the average number of phrases per sentence.

A sentence is defined as a non-empty string of non-terminating
punctuation surrounded by terminating punctuation
or beginning or end of file. Terminating punctuation is defined as !?.
Phrases are substrings of sentences, separated by one or more of the
following delimiters ,;: 

>>> text = ['The time has come, the Walrus said\n',
     'To talk of many things: of shoes - and ships - and sealing wax,\n',
     'Of cabbages; and kings.\n',
     'And why the sea is boiling hot;\n',
     'and whether pigs have wings.\n']
>>> avg_sentence_complexity(text)
3.5
"""

huge_str = ''
clean_sentences = []
for lines in text:
    huge_str += lines   
list_of_sentences = split_on_separators(huge_str, '?!.')    
for strings in list_of_sentences:
    cleaned = clean_up(strings)
    clean_sentences.append(cleaned) 
    if '' in clean_sentences:
        clean_sentences.remove('')  
num_sentences = len(clean_sentences)

large = ''
for phrases in text:
    large += phrases
list_of_phrases = split_on_separators(large, ',;:')
num_phrases = len(list_of_phrases)

asc =  num_phrases / num_sentences
return asc

这只给我3.0,这是总短语除以总句子。 我的问题是我如何计算(第一句中的总短语)/(总句子)+(第二句中的总短语)/(总句子)+ ...

1 个答案:

答案 0 :(得分:1)

我的意思是技术上如你所描述的,你只是计算1/total_sentances*num_phrases等于num_phrases/total_sentances,因为每个phrase只是1,据我所知

你真正想要做的是计算每个句子中的短语数量。然后,您可以在短语计数列表中使用numpy.mean来查找平均短语计数。

我不会比那更具体,因为这显然是一项家庭作业:p