Question

我有一个问题要求我找到文本文件中单词的最小和最大数量。我已经完成了五个问题中的三个问题，剩下的两个问题是要求最小值和最大值，我无法解决这个问题。这是我的代码：感谢您的帮助

lines, blanklines, sentences, words  = 0, 0, 0, 0,
print '-' * 50
full_text = 'input.txt'
empty_text = 'output.txt'

text_file = open(full_text, 'r')
out_file = open(empty_text, "w")


for line in text_file:
  print line
  lines += 1

  if line.startswith('\n'):
    blanklines += 1
  else:
    # assume that each sentence ends with . or ! or ?

    # so simply count these characters

    sentences += line.count('.') + line.count('!') + line.count('?')


    # create a list of words

    # use None to split at any whitespace regardless of length

    # so for instance double space counts as one space

    # word total count

    words += len(line.split())
average = float(words) / float(sentences)



text_file.close()
out_file.close()

######## T E S T   P R O G R A M ########

print
print '-' * 50
print "Total number of sentences in the input file  : ", sentences
print "Total number of words in the input file      : ", words
print "Average number of words per sentence         : ", average

Answer 1

您可以使用regex查找以下字词：

import re

for line in open(thefilepath):
 re_word = re.findall(r"[\w'-]+",line)
 sentences = re.split(r"\.",k)
 for s in sentence:
   words_in_sent=re.findall(r"[\w'-]+",k)
   summ+=len(word_in_sent)

print "Total number of sentences in the input file :{0}\n and Total number of words in the input file: {1}\n and average of words in each sentence is :{2} ".format(len(sentences),len(words),summ/len(sentences))

Answer 2

使用collecion.Counter，用于此目的的数据类型

>>> from collections import Counter
>>> lines="""
... foo bar baz hello world foo
... a b c z d
... 0 foo 1 bar"""
>>> counter = Counter()
>>> 
>>> for line in lines.split("\n"):
...     counter.update(line.split())
... 
>>> print counter.most_common(1) #print max
[('foo', 3)]
>>> print counter.most_common()[-1] #print min
('hello', 1)
>>> print len(list(counter.elements()))  #print total words
15

查找输入文件中句子中的最大和最小字数

2 个答案: