Question

我正在尝试构建一个脚本，可以帮助我遍历目录中的所有文件并识别其文件类型。最后，结果应该打印已识别的每种文件类型的总数。我正在使用magic库来识别基于MIME的文件类型。

for filename in os.listdir(os.getcwd()):
    print filename
    with magic.Magic(flags=magic.MAGIC_MIME_TYPE) as m:
        t = m.id_filename(filename)
        print t

粘贴上面的识别件似乎工作正常，但我不知道如何存储识别出的文件类型及其数量。输出应如下所示： filetype1计数 filetype2计数 ... ...

请指导一下理想的做法。

Answer 1

您可以创建一个字典，其中包含每种文件类型与其计数的映射。 e.g。

file_types = {'filetype1' : 10, 'filetype2': 20, ...}

请注意，您当前的解决方案仅适用于当前目录，而不适用于子目录。

file_types = {}

for filename in os.listdir(os.getcwd()):
    with magic.Magic(flags=magic.MAGIC_MIME_TYPE) as m:
        t = m.id_filename(filename)
        file_types.setdefault(t, 0)
        file_types[t] += 1
...

应该追加并为你计数。

Answer 2

您可以使用collections模块中的Counter类。它基本上是字典的变体，有一些额外的方法，并且在计数时不需要用0初始化它。

我没有你提到的那个magic，所以这里是一个使用my_magic替代的例子：

import collections
import os

def my_magic(filename):
    """
    This function is just a placeholder to be used in place of your id_filename()
    method.
    """
    if filename.endswith(".txt"): 
        return "TXT"
    elif filename.endswith(".pdf"):
        return "PDF"
    else:
        return "other"

# initialize the counter object:
counter = collections.Counter()

for filename in os.listdir(os.getcwd()):
    print filename

    # substitute the next line with whatever you use to determine the 
    # type of the file:
    t = my_magic(filename)
    print t

    # increase the count for the current value of 't':
    counter[t] += 1

# output what is in counter:
for ext, n in counter.items():
    print ext, n

遍历每个目录以识别文件类型和每种类型的计数？

2 个答案: