我正在尝试删除标点符号以计算文本文件中单词的平均数量。有人可以告诉我我要去哪里了吗?
name = "/Users/Desktop/name.txt"
punct = "!()-[]{};:'\,<>./?@#$%^&*_~"
no_punct = ""
textfile = open(name, "r")
letter_count1 = 0
letter_count2 = 0
for line in textfile:
for word in line.split():
for c in word:
if c not in punct:
no_punct = no_punct + c
letter_count1 += 1
letter_count2 += len(word)
avg = float(letter_count2)/float(letter_count1)
print("Average words: ", avg)
textfile.close()
答案 0 :(得分:1)
您还可以使用正则表达式删除不是文字字符或空格的任何内容:
import re
num_words = 0
num_chars = 0
with open("/Users/Desktop/name.txt", "r") as file:
for line in file:
clean = re.sub(r'[^\w\s]', '', line)
words = clean.split()
# Operate on list of words...
num_words += len(words)
for w in words:
num_chars += len(w)
avg = num_chars / num_words
print("Average word length: {}".format(avg))
答案 1 :(得分:0)
您的代码有问题,您一直在遍历每个单词program p3;
uses crt;
var i:integer;
f:text;
v:array[1..1000000] of integer;
begin
clrscr;
assign(f,'numere.txt');
reset(f);
i:=1;
repeat
read(f,v[i]);
write(v[i],' ');
i:= i+1;
until eof(f);
readln
end.
的每个字符,以检查其中是否包含来自for c in word:
的多余字符,因此例如,如果您一直在检查单词{ {1}},punct
对该单词中除somewo?rd
以外的每个字符递增。您可以使用列表理解功能通过检查当前单词中letter_count1
的任何字符是否不循环来解决此问题。
?
使用输入文件:
punct
我得到了输出:
name = "/Users/Desktop/name.txt"
name = 'name.txt'
punct = "!()-[]{};:'\,<>./?@#$%^&*_~"
textfile = open(name, "r")
letter_count1 = 0
letter_count2 = 0
for line in textfile:
for word in line.split():
if all(i not in word for i in punct):
letter_count1 += 1
letter_count2 += len(word)
avg = float(letter_count2)/float(letter_count1)
print("Average word length: ", avg)
textfile.close()
答案 2 :(得分:0)
我认为,在您的代码中,“ letter_count1”应具有单词数,而“ letter_count2”应具有字符数而无标点字符。 检查一下:
punct = "!()-+[]{};:'\,<>./?@#$%^&*_~"
nwords=letters=0
with open(file_name) as ff:
for line in ff:
for w in line.split():
lth=len(w)-len([1 for c in w if c in punct])
if lth:
nwords+=1
letters+=lth
print(letters/nwords)