Question

我正在创建一个简单的python脚本，以查找和替换也在子文件夹等内的文件内的字符串。这需要递归。

以下脚本在目标父文件夹的每个文件夹内的每个文件中找到另一个字符串，并将其替换为另一个字符串。

我在这里发现了此帖子，建议使用fileinput模块，以避免将整个文件读入内存，这可能会减慢速度……

...simplify the text replacement in a file without requiring to read the whole file in memory...

Python非常动态且诚实，我迷失于完成同一任务的许多不同方式。

如何将这种方法集成到下面的脚本中？

import subprocess, os, fnmatch

if os.name == 'nt':
    def clear_console():
        subprocess.call("cls", shell=True)
        return
else:
    def clear_console():
        subprocess.call("clear", shell=True)
        return

# Globals
menuChoice = 0
searchCounter = 0

# Recursive find/replace with file extension argument.
def findReplace(directory, find, replace, fileExtension):

    global searchCounter

    #For all paths, sub-directories & files in (directory)...
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        #For each file found with (FileExtension)...
        for filename in fnmatch.filter(files, fileExtension):
            #Construct the target file path...
            filepath = os.path.join(path, filename)
            #Open file correspondent to target filepath.
            with open(filepath) as f:
                # Read it into memory.
                s = f.read()
            # Find and replace all occurrances of (find).
            s = s.replace(find, replace)
            # Write these new changes to the target file path.
            with open(filepath, "w") as f:
                f.write(s)
                # increment search counter by one.
                searchCounter += 1

    # Report final status.
    print ('  Files Searched: ' + str(searchCounter))
    print ('')
    print ('  Search Status : Complete')
    print ('')
    input ('  Press any key to exit...')

def mainMenu():
    global menuChoice
    global searchCounter

    # range lowest index is 1 so range of 6 is 1 through 7.
    while int(menuChoice) not in range(1,1):

        clear_console()
        print ('')
        print ('  frx v1.0 - Menu')
        print ('')
        print ('  A. Select target file type extension.')
        print ('  B. Enter target directory name. eg -> target_directory/target_subfolder')
        print ('  C. Enter string to Find.')
        print ('  D. Enter string to Replace.')
        print ('')
        print ('  Menu')
        print ('')

        menuChoice = input('''
      1. All TXT  files. (*.txt )

      Enter Option: ''')
        print ('')

        # Format as int
        menuChoice = int(menuChoice)

        if menuChoice == 1:

            fextension = '*.txt'

            # Set directory name
            tdirectory = input('  Target directory name? ')
            tdirectory = str(tdirectory)
            print ('')

            # Set string to Find
            fstring = input('  String to find? (Ctrl + V) ')
            fstring = str(fstring)
            print ('')

            # Set string to Replace With
            rstring = input('  Replace with string? (Ctrl + V) ')
            rstring = str(rstring)
            print ('')

            # Report initial status
            print ('  Searching for occurrences of ' + fstring)
            print ('  Please wait...')
            print ('')

            # Call findReplace function
            findReplace('./' + tdirectory, fstring, rstring, fextension)

# Initialize program
mainMenu()

# Action Sample...
#findReplace("in this dir", "find string 1", "replace with string 2", "of this file extension")

# Confirm.
#print("done.")

Answer 1

您检查输入的内容是否为“ .txt”文件；无需担心将'rb'或'wb'传递给open()。

您说您不想为N字节文件分配N个字节，因为担心N有时可能会很大。最好将内存分配限制为最长文本行的大小，而不是最大文件的大小。让我们来介绍一个辅助函数。删除/替换这些行：

            #Open file correspondent to target filepath.
            with open(filepath) as f:
                # Read it into memory.
                s = f.read()
            # Find and replace all occurrances of (find).
            s = s.replace(find, replace)
            # Write these new changes to the target file path.
            with open(filepath, "w") as f:
                f.write(s)
                # increment search counter by one.
                searchCounter += 1

先调用helper函数，然后再调用计数器：

            update(filepath, find, replace)
            searchCounter += 1

然后定义助手：

def update(filepath, find, replace, temp_fspec='temp'):
    assert temp_fspec != filepath, filepath
    with open(filepath) as fin:
        with open(temp_fspec) as fout:
            for line in fin:
                fout.write(line.replace(find, replace))
    os.rename(temp_fspec, filepath)  # overwrites filepath

使用fileinput是无关紧要的，因为这会将许多输入的行分类为单个输出流，并且您的要求是将每个输出与其自己的输入关联。 for line in惯用语在这里很重要，它在fileinput中的作用与建议的update()助手中的作用相同。

请考虑在temp_fspec中放入不寻常的字符以减少发生冲突的可能性，或者使它成为同一文件系统中但在受影响的子树之上的完全限定路径，以确保它永不冲突。

此版本通常需要更长的时间才能运行，尤其是对于包含短行的冗长文件。如果最大文件大小>>最大行长，则此版本的最大内存占用量应该小得多。如果需要考虑很长的行，那么二进制分块方法会更合适，以解决find可能跨越块边界的情况。如果我们假设find不包含'\n'换行符，则在当前代码中无需处理这种情况。

我们可以通过以下措辞将透明屏幕例程的两个版本简化为一个版本：

def clear_console():
    clear = 'cls' if os.name == 'nt' else 'clear'
    subprocess.call(clear, shell=True)
    return

Answer 2

我相信您也可以检查全局库。这将帮助您浏览目录和子目录，还可以更新文件名。我在下面的堆栈溢出链接中找到了与您的问题相关的链接：

How can I search sub-folders using glob.glob module in Python?

递归目录查找文件使用fileInput模块替换字符串。怎么样？

2 个答案: