Question

所以我创建了这个文件夹C：\ TempFiles来测试运行以下代码片段

在这个文件夹里面我有两个文件 - ＆gt; nd1.txt，nd2.txt和文件夹C：\ TempFiles \ Temp2，其中我只有一个文件nd3.txt

现在当我执行此代码时： -

import os,file,storage
database = file.dictionary()
tools = storage.misc()
lui = -1                           # last used file index
fileIndex = 1

def sendWord(wrd, findex):                  # where findex is the file index
global lui
if findex!=lui:
    tools.refreshRecentList()
    lui = findex
if tools.mustIgnore(wrd)==0 and tools.toRecentList(wrd)==1:
    database.addWord(wrd,findex)        # else there's no point adding the word to the database, because its either trivial, or has recently been added 

def showPostingsList():
    print("\nPOSTING's LIST")
    database.display()

def parseFile(nfile, findex):
    for line in nfile:
        pl = line.split()
        for word in pl:
            sendWord(word.lower(),findex)

def parseDirectory(dirname):
    global fileIndex
    for root,dirs,files in os.walk(dirname):
        for name in dirs:
            parseDirectory(os.path.join(root,name))
        for filename in files:
            nf = open(os.path.join(root,filename),'r')
            parseFile(nf,fileIndex)
            print(" --> "+ nf.name)
            fileIndex+=1
            nf.close()

def main():
    dirname = input("Enter the base directory :-\n")
    print("\nParsing Files...")
    parseDirectory(dirname)
    print("\nPostings List has Been successfully created.\n",database.entries()," word(s) sent to database")
    choice = ""
    while choice!='y' and choice!='n':
        choice = str(input("View List?\n(Y)es\n(N)o\n -> ")).lower()
        if choice!='y' and choice!='n':
            print("Invalid Entry. Re-enter\n")
    if choice=='y':
        showPostingsList()

main()

现在我应该每次遍历三个文件，然后我用一个print（文件名）来测试它，但显然我正在遍历内部文件夹两次： -

Enter the base directory :-
C:\TempFiles

Parsing Files...
 --> C:\TempFiles\Temp2\nd3.txt
 --> C:\TempFiles\nd1.txt
 --> C:\TempFiles\nd2.txt
 --> C:\TempFiles\Temp2\nd3.txt

Postings List has Been successfully created.
 34  word(s) sent to database
View List?
 (Y)es
 (N)o
-> n

任何人都可以告诉我如何修改os.path.walk（）以避免错误这不是我的输出不正确，而是它遍历整个文件夹两次，而且效率不高。

Answer 1

你的问题不是特定于Python 3，它是os.walk()的工作方式 - 迭代已经递归子文件夹，所以你可以取出你的递归调用：

def parseDirectory(dirname):
    global fileIndex
    for root,dirs,files in os.walk(dirname):
        for filename in files:
            nf = open(os.path.join(root,filename),'r')
            parseFile(nf,fileIndex)
            print(" --> "+ nf.name)
            fileIndex+=1
            nf.close()

通过为parseDirectory()调用dirs，您开始了另一个独立子文件夹的独立行走。

正确使用os.path.walk（）时出错

1 个答案: