不在文本文件中计算字符数

时间:2015-06-04 07:19:27

标签: python file python-3.x io count

我正在使用文本文件I / O执行另一个程序,我感到困惑,因为我的代码似乎非常合理,但结果似乎很疯狂。我想在政治演讲的文本文件中计算单词,字符,差事和独特单词的数量。这是我的代码,所以它可能会稍微清楚一点。

#This program will serve to analyze text files for the number of words in
#the text file, number of characters, sentances, unique words, and the longest
#word in the text file. This program will also provide the frequency of unique
#words. In particular, the text will be three political speeches which we will
#analyze, building on searching techniques in Python.
#CISC 101, Queen's University
#By Damian Connors; 10138187

def main():
    harper = readFile("Harper's Speech.txt")
    print(numCharacters(harper), "Characters.")
    obama1 = readFile("Obama's 2009 Speech.txt")
    print(numCharacters(obama1), "Characters.")
    obama2 = readFile("Obama's 2008 Speech.txt")
    print(numCharacters(obama1), "Characters.")

def readFile(filename):
    '''Function that reads a text file, then prints the name of file without
'.txt'. The fuction returns the read file for main() to call, and print's
the file's name so the user knows which file is read'''
    inFile1 = open(filename, "r")
    fileContentsList = inFile1.readlines()
    inFile1.close()
    print(filename.replace(".txt", "") + ":")  #this prints filename
    return fileContentsList

def numCharacters(file):
    return len(file) - file.count(" ")

我现在遇到麻烦的是计算角色。它一直说#是85,但它是一个非常大的文件,我知道它应该是7792个字符。知道我对此做错了什么吗?这是我的shell输出,我使用的是python 3.3.3

>>> ================================ RESTART ================================
>>> 
Harper's Speech:
85 Characters.
Obama's 2009 Speech:
67 Characters.
Obama's 2008 Speech:
67 Characters.
>>> 

所以你可以看到我有3个语音文件,但是他们没有那么少的字符。

4 个答案:

答案 0 :(得分:1)

您应该更改此行fileContentsList = inFile1.readlines() 现在你要计算奥巴马在演讲中有多少行。 将readLines更改为read()并且它将起作用

答案 1 :(得分:0)

sqlstmt函数返回一个包含的列表,因此它的长度将是文件中的行数,不是的数量字符。

您必须找到一种方法来读取所有字符(以便长度正确),例如使用readlines

或者通过每一行来计算其中的角色,或许类似于:

read()

(假设您选择用于计算字符的实际方法是正确的)。

顺便说一下,你的第三个输出语句引用的是tot = 0 for line in file: tot = tot + len(line) - line.count(" ") return tot 而不是obama1,你也可能想要修复它。

答案 2 :(得分:0)

你正在计算线条。更详细地说,您实际上是将文件读入行列表然后对它们进行计数。随后是清理版的代码。

def count_lines(filename):
    with open(filename) as stream:
        return len(stream.readlines())

对此类代码计算单词的最简单更改是读取整个文件并将其拆分为单词然后计算它们,请参阅以下代码。

def count_words(filename):
    with open(filename) as stream:
        return len(stream.read().split())

注意:

  • 可能需要更新代码以匹配您确切的单词定义。
  • 此方法不适用于非常大的文件,因为它将整个文件读入内存,并且单词列表也存储在那里。

因此,上述代码更像是一个概念模型,而不是最佳的最终解决方案。

答案 3 :(得分:0)

您目前看到的是文件中的行数。由于fileContentsList将返回一个列表,numCharacters将返回list的大小。

如果您想继续使用'readlines',您需要计算每行中的字符数并添加它们以获取文件中的字符总数。

def main():
    print(readFile("Harper's Speech.txt"), "Characters.")
    print(readFile("Obama's 2009 Speech.txt"), "Characters.")
    print(readFile("Obama's 2008 Speech.txt"), "Characters.")

def readFile(filename):
    '''Function that reads a text file, then prints the name of file without
'.txt'. The fuction returns the read file for main() to call, and print's
the file's name so the user knows which file is read'''
    inFile1 = open(filename, "r")
    fileContentsList = inFile1.readlines()
    inFile1.close()
    totalChar =0    # Variable to store total number of characters
    for line in fileContentsList:    # reading all lines
        line = line.rstrip("\n")    # removing line end character '\n' from lines
        totalChar = totalChar + len(line) - line.count(" ")    # adding number of characters in line to total characters,
                                                               # also removing number of whitespaces in current line
    print(filename.replace(".txt", "") + ":")  #this prints filename
    return totalChar

main() # calling main function.