在.txt文件中搜索多个列表词并输出计数/%/单词

时间:2019-05-15 21:12:15

标签: python-3.x nltk

我正在尝试在给定的.txt文件(源文件名)中搜索一系列列表项,并提供其计数输出,.txt文件中这些项的百分比以及找到的每组列出的术语的确切字词在文本中。

我应该如何设置计数/报告功能?


#build GUI for text file selection
import PySimpleGUI as sg      
window_rows = [[sg.Text('Please select a .txt file for analysis')],      
                 [sg.InputText(), sg.FileBrowse()],      
                 [sg.Submit(), sg.Cancel()]]      
window = sg.Window('Cool Tool Name', window_rows)    
event, values = window.Read()    
window.Close()
source_filename = values[0]    

#written communication term list
dwrit = ('write','written','writing', 'email', 'memo')
written = dwrit

#oral communication term list
doral = ('oral','spoken','talk','speech,')
oral = doral 

#visual communication term list
dvis = ('visual','sight') 
visual = dvis

#auditory communication term list
daud = ('hear', 'hearing', 'heard')
auditory = daud

#multimodal communication term list
dmm = ('multimodal','multi-modal','mixed media','audio and visual')
multimodal = dmm

#define all term lists 
communication = (dwrit, doral, dvis, daud, dmm)

#search lists
from collections import Counter
with open(source_filename, encoding = 'ISO-8859-1') as f:
     for line in f:
         Counter.update(line.lower().split())
print(Counter(communication))

问题是,我现在正在打印 all 列表中的 all 术语,但实际上并没有搜索文档 just 列出的那些条款,而忽略所有其他条款...

理想的输出如下:

书面:[数字,%,单词]

口头:[数字,%,单词]

视觉:[数字,%,单词]

审核:[数字,%,单词]

多模式:[数字,%,单词]

1 个答案:

答案 0 :(得分:0)

Counter是一本字典,它以您要计数的事物为基调。这就是为什么您看到每个单词的原因,而不仅仅是查找与您感兴趣的单词相对应的单词(作为“计数器”中的键)。下面是一个示例,说明了如何执行其中一项,这是您可以用来执行其他列表的一种模式。

尝试一下:

from collections import Counter
c = Counter()
#search lists
with open(source_filename, encoding = 'ISO-8859-1') as f:
    for line in f:
        c.update(line.lower().split())
written_words = len([x for x in written if x in c.keys()])
print(f'Written: [{written_words}, {written_words/len(c.keys())} %]')