Question

也许有人可以帮助我，因为我找不到任何答案。我是法国人，在计算文件的所有单词时没有遗漏任何东西（é，à，è...）有点复杂。

这就是为什么我希望将我文件中的所有单词转换为大写，然后开始计算单词。我对Python的功能很新，我真的不知道如何从文件中使用它。

我已经开始这样做了：

import re 
from collections import Counter 
f = open("vie.txt") 
words = re.findall("[a-zA-Z_]+", f.read()) 
count = len(words) 
print ("Number of total words: %s" % count) 
f.close()

我正在考虑这样的事情：

  hist= dict()
f = open("vie.txt")

def process_line(ligne, hist):
    ligne = ligne.replace('-', ' ')

    for mot in ligne.split():
        mot = mot.strip(string.punctuation + string.whitespace)
        mot = mot.upper()
        hist[mot] = hist.get(mot, 0) + 1

hist = process_file("vie.txt")

def total_mots(hist):
    return sum(hist.values())

print('Number of total words:', total_mots(hist))

但它太长了，我想要更短的东西而不使用字典。

Answer 1

将所有单词转换为大写将无法帮助您重音：

>>> "é".upper()
'É'

但是，您可以使用Unidecode，pip install unidecode，然后

>>> import unidecode
>>> unidecode.unidecode("ééé èèè").count(unidecode.unidecode("êêê"))
2

大写和计数文件中的单词法语字符

1 个答案: