计算文件每一行的每个单词中的字符数

时间:2015-07-12 17:53:07

标签: python text count counter word-count

此代码将打印文本文件中的整行数,总字数和总字符数。它工作正常并给出预期的输出。但我想计算每行中的字符数并打印如下: -

Line No. 1 has 58 Characters
Line No. 2 has 24 Characters

代码: -

import string
def fileCount(fname):
    #counting variables
    lineCount = 0
    wordCount = 0
    charCount = 0
    words = []

    #file is opened and assigned a variable
    infile = open(fname, 'r')

    #loop that finds the number of lines in the file
    for line in infile:
        lineCount = lineCount + 1
        word = line.split()
        words = words + word

    #loop that finds the number of words in the file
    for word in words:
        wordCount = wordCount + 1
        #loop that finds the number of characters in the file
        for char in word:
            charCount = charCount + 1
    #returns the variables so they can be called to the main function        
    return(lineCount, wordCount, charCount)

def main():
    fname = input('Enter the name of the file to be used: ')
    lineCount, wordCount, charCount = fileCount(fname)
    print ("There are", lineCount, "lines in the file.")
    print ("There are", charCount, "characters in the file.")
    print ("There are", wordCount, "words in the file.")
main()

作为

for line in infile:
    lineCount = lineCount + 1 

计算整行,但如何为每个行进行此操作? 我使用的是Python 3.X

4 个答案:

答案 0 :(得分:1)

将所有信息存储在dict中,然后按键访问。

use bandwidthThrottle\tokenBucket\Rate;
use bandwidthThrottle\tokenBucket\TokenBucket;
use bandwidthThrottle\tokenBucket\storage\FileStorage;

$storage = new FileStorage(__DIR__ . "/api.bucket");
$rate    = new Rate(10, Rate::SECOND);
$bucket  = new TokenBucket(10, $rate, $storage);
$bucket->bootstrap(10);

if (!$bucket->consume(1, $seconds)) {
    http_response_code(429);
    header(sprintf("Retry-After: %d", floor($seconds)));
    exit();
}

该代码仅适用于由空格分隔的单词,因此您需要牢记这一点。

答案 1 :(得分:0)

定义您想要计算的允许字符的df.select(grouping_columns).distinct() ,然后您可以使用set获取大部分数据。
下面,我选择了字符集:

['!',''','#','$','%','&',''','(',')','*','+',' ,',' - ','。','/','0','1','2','3','4','5','6','7','8' ,'9',':',';','<','=','>','?','@','A','B','C','D' ,'E','F','G','H','我','J','K','L','M','N','O','P',' Q','R','S','T','U','V','W','X','Y','Z','[','\',']' ,'^','_','`','a','b','c','d','e','f','g','h','i',' j','k','l','m','n','o','p','q','r','s','t','u','v' ,'w','x','y','z','{','|','}','〜']

len

答案 2 :(得分:-1)

我被分配了创建程序的任务,该程序打印一行中的字符数。

作为编程的菜鸟,我发现这非常困难:(。

这是我想出的,以及他的回应 -

这是您计划的核心部分:

with open ('data_vis_tips.txt', 'r') as inFile:
    with open ('count_chars_per_line.txt', 'w') as outFile:
        chars = 0
            for line in inFile:
                line = line.strip('\n')
                chars = len(line)
                outFile.write(str(len(line))+'\n')

可以简化为:

with open ('data_vis_tips.txt', 'r') as inFile:
    for line in inFile:
        line = line.strip()
        num_chars = len(line)
        print(num_chars)

请注意,strip()函数的参数不是必需的;它默认剥离空格,'\ n'是空格。

答案 3 :(得分:-1)

这是一个使用内置collections.Counter的简单版本,它是一个专门的dict,用于计算其输入。我们可以使用Counter.update()方法在每一行的所有单词(唯一或非单词)中啜饮:

from collections import Counter

def file_count_2(fname):

    line_count = 0
    word_counter = Counter()

    infile = open(fname, 'r')
    for line in infile:
        line_count += 1
        word_counter.update( line.split() )

    word_count = 0
    char_count = 0

    for word, cnt in word_counter.items():
        word_count += cnt
        char_count += cnt * len(word)

    print(word_counter)

    return line_count, word_count, char_count

注意:

  • 我对此进行了测试,它为您的代码提供了相同的计数
  • 它会更快,因为你不会迭代地附加到列表words(最好只是散列唯一的单词并存储它们的计数,这就是Counter所做的),也没有必要每当我们看到一个单词出现时迭代并增加charCount。
  • 如果您只想word_count而不是char_count,则可以直接点击word_count = sum(word_counter.values())而无需迭代word_counter