文件中单词的平均长度并删除标点符号python 3

时间:2018-11-24 16:54:54

标签: python python-3.x file loops

我正在尝试删除标点符号以计算文本文件中单词的平均数量。有人可以告诉我我要去哪里了吗?

name = "/Users/Desktop/name.txt"
punct = "!()-[]{};:'\,<>./?@#$%^&*_~"
no_punct = ""
textfile = open(name, "r")
letter_count1 = 0
letter_count2 = 0
for line in textfile:
    for word in line.split():
        for c in word:
            if c not in punct:
                no_punct = no_punct + c
                letter_count1 += 1
                letter_count2 += len(word)   

avg = float(letter_count2)/float(letter_count1)
print("Average words: ", avg)

textfile.close()

3 个答案:

答案 0 :(得分:1)

您还可以使用正则表达式删除不是文字字符或空格的任何内容:

import re

num_words = 0
num_chars = 0

with open("/Users/Desktop/name.txt", "r") as file:
    for line in file:
        clean = re.sub(r'[^\w\s]', '', line)
        words = clean.split()

        # Operate on list of words...
        num_words += len(words)
        for w in words:
            num_chars += len(w)

    avg = num_chars / num_words
    print("Average word length: {}".format(avg))

答案 1 :(得分:0)

您的代码有问题,您一直在遍历每个单词program p3; uses crt; var i:integer; f:text; v:array[1..1000000] of integer; begin clrscr; assign(f,'numere.txt'); reset(f); i:=1; repeat read(f,v[i]); write(v[i],' '); i:= i+1; until eof(f); readln end. 的每个字符,以检查其中是否包含来自for c in word:的多余字符,因此例如,如果您一直在检查单词{ {1}},punct对该单词中除somewo?rd以外的每个字符递增。您可以使用列表理解功能通过检查当前单词中letter_count1的任何字符是否不循环来解决此问题。

?

使用输入文件:

punct

我得到了输出:

name = "/Users/Desktop/name.txt"
name = 'name.txt'
punct = "!()-[]{};:'\,<>./?@#$%^&*_~"

textfile = open(name, "r")

letter_count1 = 0
letter_count2 = 0
for line in textfile:
    for word in line.split():
        if all(i not in word for i in punct):
            letter_count1 += 1
            letter_count2 += len(word)   

avg = float(letter_count2)/float(letter_count1)
print("Average word length: ", avg)

textfile.close()

答案 2 :(得分:0)

我认为,在您的代码中,“ letter_count1”应具有单词数,而“ letter_count2”应具有字符数而无标点字符。 检查一下:

punct = "!()-+[]{};:'\,<>./?@#$%^&*_~"
nwords=letters=0
with open(file_name) as ff:
    for line in ff:
        for w in line.split():
            lth=len(w)-len([1 for c in w if c in punct]) 
            if lth: 
                nwords+=1 
                letters+=lth 

print(letters/nwords)