如何使用os.walk或glob.glob获取目录中所有类型的文件扩展名

时间:2019-03-14 16:39:50

标签: python file-extension os.walk language-detection

我有一个代码可以检测目录中文件的语言。但是,在提到扩展名类型时,我如何才能检测目录中所有文件扩展名(例如:.pdf,.xlsx,.docx等)的语言,而不仅是代码中提到的.txt文件。附加代码以供参考。我想知道如何使用glob和os.walk做到这一点。

import csv
from fnmatch import fnmatch
try:
    from langdetect import detect
except ImportError:
    detect = lambda _: '<dunno>'
import os

rootdir = '.'  # current directory
extension = '.txt'
file_pattern = '*' + extension

with open('output.csv', 'w', newline='', encoding='utf-8') as outfile:
    csvwriter = csv.writer(outfile)

    for dirpath, subdirs, filenames in os.walk(os.path.abspath(rootdir)):
        for filename in filenames:
            if fnmatch(filename, file_pattern):
                lang = detect(os.path.join(dirpath, filename))
                csvwriter.writerow([dirpath, filename, lang])

1 个答案:

答案 0 :(得分:2)

IIUC,您可以将fnmatch支票替换为

eoi = ['*.pdf', '*.xlsx', '*.docx', '*.txt']     # extensions of interest list
if any(fnmatch(file, ext) for ext in eoi):
    lang = ...