Python文本文件的字数

时间:2015-11-30 02:42:17

标签: python function text

我正在尝试使用python函数计算文本文件中单词的频率。我可以分别得到所有单词的频率,但是我试图通过将它们放在列表中来获取特定单词的计数。这是我迄今为止所拥有的,但目前我被卡住了。我的

def repeatedWords():
    with open(fname) as f:
        wordcount={}
        for word in word_list:
            for word in f.read().split():
                if word not in wordcount:
                    wordcount[word] = 1
                else:
                    wordcount[word] += 1
            for k,v in wordcount.items():
                 print k, v

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
repeatedWords('file.txt')

更新后,仍显示所有字词:

def repeatedWords(fname, word_list):
with open(fname) as f:
    wordcount = {}
    for word in word_list:
        for word in f.read().split():
            wordcount[word] = wordcount.get(word, 0) + 1


for k,v in wordcount.items():
    print k, v

word_list = ['艾玛','伍德豪斯','父亲','泰勒','小姐','已','她','她'] repeatedWords('Emma.txt',word_list)

2 个答案:

答案 0 :(得分:1)

所以你只需要该列表中特定单词的频率(Emma,Woodhouse,Father ......)?如果是这样,这段代码可能会有所帮助(尝试运行它):

    word_list = ['Emma','Woodhouse','father','Taylor','Miss','been','she','her']
    #i'm using this example text in place of the file you are using
    text = 'This is an example text. It will contain words you are looking for, like Emma, Emma, Emma, Woodhouse, Woodhouse, Father, Father, Taylor,Miss,been,she,her,her,her. I made them repeat to show that the code works.'
    text = text.replace(',',' ') #these statements remove irrelevant punctuation
    text = text.replace('.','')
    text = text.lower() #this makes all the words lowercase, so that capitalization wont affect the frequency measurement

    for repeatedword in word_list:
        counter = 0 #counter starts at 0
        for word in text.split():
            if repeatedword.lower() == word:
                counter = counter + 1 #add 1 every time there is a match in the list
        print(repeatedword,':', counter) #prints the word from 'word_list' and its frequency

输出显示您提供的列表中只有那些字词的频率,这是您想要的吗?

在python3中运行时产生的输出是:

    Emma : 3
    Woodhouse : 2
    father : 2
    Taylor : 1
    Miss : 1
    been : 1
    she : 1
    her : 3

答案 1 :(得分:0)

处理此问题的最佳方法是在Python字典中使用get方法。它可以是这样的:

def repeatedWords():
with open(fname) as f:
    wordcount = {}
    #Example list of words not needed
    nonwordlist = ['father', 'Miss', 'been']
    for word in word_list:
        for word in file.read().split():
            if not word in nonwordlist:
                wordcount[word] = wordcount.get(word, 0) + 1


# Put these outside the function repeatedWords
for k,v in wordcount.items():
    print k, v

print语句应该给你:

word_list =  [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
newDict = {}
for newWord in word_list:
    newDict[newWord] = newDict.get(newWord, 0) + 1

print newDict

此行wordcount[word] = wordcount.get(word, 0) + 1的作用是,它首先在词典word中查找wordcount,如果该词已经存在,则首先获取它的值并添加{ {1}}。如果1不存在,则该值默认为word,并且在此实例中添加0,使其成为该单词的第一次出现,其计数为1