Question

我是python的新手，并被研究循环和文件夹导航时遇到的问题所困扰。

任务很简单：循环浏览文件夹并计数所有“ .txt”文件。

我相信可能会有一些模块可以轻松解决此任务，如果您可以共享它们，我将不胜感激。但是，由于这只是我在学习python时遇到的一个随机问题，如果可以使用我刚刚获得的工具（例如for / while循环）解决此问题，那就太好了。

我使用了for和while子句来循环浏览文件夹。但是，我无法完全遍历文件夹。

这是我使用的代码：

import os
count=0 # set count default
path = 'E:\\' # set path
while os.path.isdir(path):
    for file in os.listdir(path): # loop through the folder
        print(file)   # print text to keep track the process
        if file.endswith('.txt'):
            count+=1
            print('+1')   #
        elif os.path.isdir(os.path.join(path,file)): #if it is a subfolder
            print(os.path.join(path,file))
            path=os.path.join(path,file)
            print('is dir')
            break
        else:
            path=os.path.join(path,file)

由于文件夹中文件和子文件夹的数量未知，因此我认为在这里使用while循环是合适的。但是，我的代码有很多错误或陷阱，我不知道如何解决。例如，如果存在多个子文件夹，则此代码将仅循环第一个子文件夹，而忽略其余子文件夹。

Answer 1

您的问题是，您很快就会尝试查看不存在的文件。想象一下一个目录结构，其中首先看到一个名为A（E:\A）的非目录，然后是一个文件b（E:\b）。

在第一个循环中，您得到A，检测到它未以.txt结尾，并且它是一个目录，因此将path更改为E:\A。

在第二次迭代中，您得到b（意思是E:\b），但是所有测试（除了.txt扩展测试）和操作都将其与新的{{1 }}，因此您相对于path而不是E:\A\b进行测试。

类似地，如果E:\b是一个目录，则您会立即中断内部循环，因此即使E:\A存在，如果它出现在目录中的E:\c.txt之后，迭代顺序，甚至都看不到。

目录树遍历代码必须包含某种堆栈，无论是显式（通过A目录的append和pop进行最终处理）还是隐式（通过递归，它使用调用堆栈来达到相同的目的。

无论如何，您应该真正处理您的具体情况with os.walk：

list

仅出于说明目的，代码的显式堆栈方法将类似于：

for root, dirs, files in os.walk(path):
    print(root) # print text to keep track the process
    count += sum(1 for f in files if f.endswith('txt'))

    # This second line matches your existing behavior, but might not be intended
    # Remove it if directories ending in .txt should not be included in the count
    count += sum(1 for d in files if d.endswith('txt'))

Answer 2

您可能想将recursion应用于此问题。简而言之，您将需要一个函数来处理在遇到子目录时将自行调用的目录。

Answer 3

对于嵌套目录，使用os.walk之类的函数会更容易以这个为例 subfiles = [] for dirpath, subdirs, files in os.walk(path): for x in files: if x.endswith(".txt"): subfiles.append(os.path.join(dirpath, x)) 它会返回所有txt文件的列表否则，您需要对此类任务使用递归

Answer 4

这可能比您需要的更多，但是它允许您列出目录中的所有.txt文件，但您也可以在文件中向搜索添加条件。这是函数：

def file_search(root,extension,search,search_type):
    import pandas as pd
    import os
    col1 = []
    col2 = []
    rootdir = root
    for subdir, dirs, files in os.walk(rootdir):
        for file in files:
            if "." + extension in file.lower():
                try:
                    with open(os.path.join(subdir, file)) as f:
                        contents = f.read()
                    if search_type == 'any':
                        if any(word.lower() in contents.lower() for word in search):
                            col1.append(subdir)
                            col2.append(file)

                    elif search_type == 'all':
                        if all(word.lower() in contents.lower() for word in search):
                            col1.append(subdir)
                            col2.append(file)
                except:
                    pass
    df = pd.DataFrame({'Folder':col1,
                      'File':col2})[['Folder','File']]
    return df

以下是该功能的使用示例：

search_df = file_search(root = r'E:\\',
                        search=['foo','bar'], #words to search for
                        extension = 'txt',    #could change this to 'csv' or 'sql' etc.
                        search_type = 'all')  #use any or all

search_df

Answer 5

@ShadowRanger的回答已经很好地解决了代码分析问题。我将尝试解决您的问题的这一部分：

可能有一些模块可以轻松解决此任务

对于这类任务，实际上存在glob模块，该模块实现Unix样式路径名模式扩展。

要计算目录及其所有子目录中.txt个文件的数量，可以简单地使用以下内容：

import os
from glob import iglob, glob  

dirpath = '.'  # for example

# getting all matching elements in a list a computing its length
len(glob(os.path.join(dirpath, '**/*.txt'), recursive=True))
# 772

# or iterating through all matching elements and summing 1 each time a new item is found
# (this approach is more memory-efficient)
sum(1 for _ in iglob(os.path.join(dirpath, '**/*.txt'), recursive=True))
# 772

基本上glob.iglob()是glob.glob()的迭代器版本。

如何彻底遍历文件夹？蟒蛇

5 个答案: