Question

我想在路径/ test /中的文件列表（.txt）中计算一个表情列表。

这是我在所有文件中计算笑脸的方法。

    def count_string_occurrence():
        import os
        total = 0
        x = 0
        for file in os.listdir("C:/users/M/Desktop/test"):
                if file.endswith(".txt"):
                    string = ":)" #define search term
                    f=open(file,encoding="utf8")
                    contents = f.read()
                    f.close()
                    x=contents.count(string) 
                    total +=int(x) #calculate occurance of smiley in all files
        print("Number of " + string + " in all files equals " + str(total))

    count_string_occurrence()

我现在如何循环不同的表情并分别打印每个笑脸的结果？由于我已遍历不同的文件，因此很复杂。

Answer 1

您可以将搜索字符串设为函数参数，然后使用不同的搜索字词多次调用您的函数。

def count_string_occurrence(string):
    import os
    total = 0
    x = 0
    for file in os.listdir("C:/users/M/Desktop/test"):
        if file.endswith(".txt"):
            f=open(file,encoding="utf8")
            contents = f.read()
            f.close()
            x=contents.count(string)
            total +=int(x) #calculate occurance of smiley in all files
    return total

smilies = [':)', ':P', '=]']
for s in smilies =
    total = count_string_occurrence(s)
    print("Number of {} in all files equals {}".format( s, total ))

另一种方法是将一个表情符号列表传递给您的函数，然后在if块内进行迭代。也许以{ ':)': 5, ':P': 4, ... }

的形式将结果存储在dict中

Answer 2

关于您的问题：您可以使用每个字符串的计数保留字典并返回该字典。但是如果你保持现有的结构，跟踪它就不会很好。

这导致了我的建议：

你没有明显的理由将整个文件保存在内存中，你可以逐行检查并查看当前行中的字符串。
您还要多次阅读相同的文件，而您只能阅读一次，并检查字符串是否存在。
您正在检查文件的扩展名，这听起来像是glob的作业。
您可以使用defaultdict，因此您无需关心最初的计数是否为0。

修改后的代码：

from collections import defaultdict
import glob

SMILIES = [':)', ':P', '=]']

def count_in_files(string_list):
    results = defaultdict(int)
    for file_name in glob.iglob('*.txt'):
        print(file_name)
        with open(file_name) as input_file:
            for line in input_file:
                for s in string_list:
                    if s in line:
                        results[s] += 1
    return results

print(count_in_files(SMILIES))

最后，通过这种方法，如果您使用的是Python＆gt; = 3.5，则可以将glob调用更改为for file_name in glob.iglob('**/*.txt', recursive=True)，以便在需要时以递归方式进行搜索。

这将打印如下内容：

defaultdict(<class 'int'>, {':P': 2, ':)': 1, '=]': 1})

计算多个文件中的不同字符串

2 个答案: