我必须使用Python来打印文本文件的每个句子中的单词数和平均单词长度。我无法使用NLTK或Regex进行此项任务。
文件中的句子以句号,感叹号或问号结尾。连字符,短划线或撇号不会结束句子。引号不会结束句子。但是,有些时期并没有结束句子。例如,Mrs.,Mr.,Dr。,Fr.,Jr.,St。都是常见的缩写。
例如,如果输入文本为:
[(no. of words, mean length of words in sentence1),
(no. of words, mean length of words in sentence2),
...]
...输出应该是:
p= ("Mrs.","Mr.","St.")
def punct_after_ab(texts):
new_text = texts
for abb in p:
new_text = new_text.replace(abb,abb[:-1])
return print(new_text)
import numpy
def word_list(text):
special_characters = ["'",","]
clean_text = text
for string in special_characters:
clean_text = clean_text.replace(string, "")
count_list = [len(i) for i in clean_text.split()]
count = [numpy.mean(count_list)]
return print((count_list),(count))
代码:
java.lang.IllegalStateException: Couldn't read row 0, col 0 from CursorWindow. Make sure the Cursor is initialized correctly before accessing data from it.
但是当我测试它时,它不会分裂句子。
答案 0 :(得分:0)
使用.split(' ')
行的内容来分隔单词(在所述情况下用空格分隔),然后使用数组操作和基本数学/统计来获得答案。如果您将问题更新为更具体并包含一些自己的代码,我愿意相应地修改我的答案。
如果你没有在你提出的问题上付出太多努力,你会发现在这个网站上,你不会得到非常有用的答案。在提出问题之前,尝试做一些研究并尽可能多地编写代码。这使人们更容易帮助你,他们会更愿意。截至目前,您似乎只是想让某人为您做功课。
<强>更新强>
您的代码大部分都有效,只需要更改一些内容。我玩了你所拥有的东西,我能够将文本分解为句子数组,你可以从中继续运行统计数据。
input.txt中:
My name? Mr. Bob. Your name? Mrs. Lily!
What's up?
test.py(我使用python 3.6):
def punct_after_ab(texts):
p = ("Mrs.", "Mr.", "St.")
new_text = texts
for abb in p:
new_text = new_text.replace(abb,abb[:-1])
return new_text
def clean_text(text):
special_characters = ["'", ","]
clean_text = text
for string in special_characters:
clean_text = clean_text.replace(string, "")
return clean_text
def split_sentence(text):
#Initialize vars
sentences = []
start = 0
i = 0
# Loop through the text until you find punctuation,
# then add the sentence to the final array
for char in text:
if char == '.':
sentences.append(text[start:i+1])
start = i + 2
if char == '?':
sentences.append(text[start:i+1])
start = i + 2
if char == '!':
sentences.append(text[start:i+1])
start = i + 2
i += 1
# Print the sentences to console
for sentence in sentences:
print(sentence)
def main():
# Ask user for file name
file = input("Enter file name: ")
# Open the file and strip newline chars
fd = open(file).read()
fd = fd.strip("\n")
# Remove punctuation that doesn't delineate sentences
text = punct_after_ab(fd)
text = clean_text(text)
# Separate sentences
split_sentence(text)
# Run program
if __name__ == '__main__':
main()
我能够输出以下文字:
Enter file name: input.txt
My name?
Mr Bob.
Your name?
Mrs Lily!
Whats up?
Process finished with exit code 0
从那里你可以轻松地进行句子统计。我只是输入了这个,所以你可能想要通过它并清理一下。我希望这会有所帮助。