Python 3使用word计数器列出行&一个单词出现在文件中的数组的次数

时间:2016-11-17 15:24:35

标签: python arrays counter python-3.5

所以我的问题是我在另一个名为'GA'的脚本中创建了一个数组来存储单词,因为它最终可以容纳100多个单词。我试图调用此数组并在另一个txt文档中搜索单词并输出每个单词的找到次数。在我的代码'def ReadFile'的第一部分中,我打开文件清理它并显示这些单词所在的行。

问题本身似乎无法找到一种方法来显示带有单词的行以及输出每次被击中的次数,这是我的代码。

 import re
 from collections import Counter
 from Categories.GoingAce import GA

 path = "ChatLogs/Chat1.txt"
 file = path

Lex = Counter(GA)

count = {}

def ReadFile():

    with open(file) as file_read:
        content = file_read.readlines()
        for line in content:
            if any(word in line for word in Lex):
                Cleanse = re.sub('<.*?>', '', line)
                print(Cleanse)

    file_read.close()

def WordCount():

    with open(file) as f:
       Lex = Counter(f.read().split())
    for item in Lex.items(): print ("{}\t{}".format(*item))
    f.close()


#ReadFile()
WordCount()

原始输入将如下所示

<200>   <ilovethaocean> <08/22/06 12:15:36 AM>  hi asl?
<210>   <a_latino_man559>   <08/22/06 12:15:53 AM>  32 m fresno
<210>   <a_latino_man559>   <08/22/06 12:15:53 AM>  u?
<200>   <ilovethaocean> <08/22/06 12:16:12 AM>  "13/f/ca, how r u?"
<200>   <a_latino_man559>   <08/22/06 12:16:18 AM>  13?

然后我用它隐藏括号中的所有内容:

Cleanse = re.sub('<.*?>', '', line)
                    print(Cleanse)

这样的输出:

嗨asl?

32米弗雷斯诺

U&

“13 / f / ca,你好吗?”

13

除此之外,作为一个例子,我的GA数组包含(hi,u,13)我的完美目标是这样的输出:

嗨出现了1次 line_num hi asl?

你出现了2次 line_num u?

line_num 13 / f / ca,你好吗?

1 个答案:

答案 0 :(得分:0)

以下是一个简化示例的方法:

from collections import defaultdict

occurrences = defaultdict(list)
words = ['cat', 'dog', 'bird', 'person']

with open(path_to_your_file) as f:
    for i, line in enumerate(f.readlines(), start=1):
        for word in words:
            if word in line:
                occurrences[word] += [(i, line)]

for (word, matches) in occurrences.items():
    total_count = sum(line.count(word) for _, line in matches)
    print '%s appeared %d time(s). Line(s):' % (word, total_count)
    print '\n'.join(['\t %d) %s' % (line_num, line.strip()) for line_num, line in matches])

给定一个包含以下内容的文本文件:

cat, rat, dog, cat
bird, person
animal
insect
whatever
another bird
etc.

脚本打印

bird appeared 2 time(s). Line(s):
     2) bird, person
     6) another bird
person appeared 1 time(s). Line(s):
     2) bird, person
dog appeared 1 time(s). Line(s):
     1) cat, rat, dog, cat
cat appeared 2 time(s). Line(s):
     1) cat, rat, dog, cat