我有一个问题要求我找到文本文件中单词的最小和最大数量。我已经完成了五个问题中的三个问题,剩下的两个问题是要求最小值和最大值,我无法解决这个问题。这是我的代码:感谢您的帮助
lines, blanklines, sentences, words = 0, 0, 0, 0,
print '-' * 50
full_text = 'input.txt'
empty_text = 'output.txt'
text_file = open(full_text, 'r')
out_file = open(empty_text, "w")
for line in text_file:
print line
lines += 1
if line.startswith('\n'):
blanklines += 1
else:
# assume that each sentence ends with . or ! or ?
# so simply count these characters
sentences += line.count('.') + line.count('!') + line.count('?')
# create a list of words
# use None to split at any whitespace regardless of length
# so for instance double space counts as one space
# word total count
words += len(line.split())
average = float(words) / float(sentences)
text_file.close()
out_file.close()
######## T E S T P R O G R A M ########
print
print '-' * 50
print "Total number of sentences in the input file : ", sentences
print "Total number of words in the input file : ", words
print "Average number of words per sentence : ", average
答案 0 :(得分:0)
您可以使用regex
查找以下字词:
import re
for line in open(thefilepath):
re_word = re.findall(r"[\w'-]+",line)
sentences = re.split(r"\.",k)
for s in sentence:
words_in_sent=re.findall(r"[\w'-]+",k)
summ+=len(word_in_sent)
print "Total number of sentences in the input file :{0}\n and Total number of words in the input file: {1}\n and average of words in each sentence is :{2} ".format(len(sentences),len(words),summ/len(sentences))
答案 1 :(得分:0)
使用collecion.Counter
,用于此目的的数据类型
>>> from collections import Counter
>>> lines="""
... foo bar baz hello world foo
... a b c z d
... 0 foo 1 bar"""
>>> counter = Counter()
>>>
>>> for line in lines.split("\n"):
... counter.update(line.split())
...
>>> print counter.most_common(1) #print max
[('foo', 3)]
>>> print counter.most_common()[-1] #print min
('hello', 1)
>>> print len(list(counter.elements())) #print total words
15