Question

我编写了一个简单的脚本，它在一个文件夹上运行，并循环遍历文件夹中的所有文件进行一些处理（实际处理并不重要）。

我有一个文件夹。此文件夹包含多个不同文件夹。在这些文件夹中是可变数量的文件，我想在其上运行我编写的脚本。我努力调整我的代码来做到这一点。

之前，文件结构是：

Folder
  Html1
  Html2
  Html3
  ...

现在是：

Folder
  Folder1
    Html1
  Folder2
    Html2
    Html3

我仍然希望在所有HTML上运行代码。

这是我尝试这样做的结果，

error on line 25, in CleanUpFolder
    orig_f.write(soup.prettify().encode(soup.original_encoding))
TypeError: encode() argument 1 must be string, not None

def CleanUpFolder(dir):
    do = dir
    dir_with_original_files = dir
    for root, dirs, files in os.walk(do):
        for d in dirs:
            for f in files:
                print f.title()
                if f.endswith('~'): #you don't want to process backups
                    continue
                original_file = os.path.join(root, f)
                with open(original_file, 'w') as orig_f, \
                    open(original_file, 'r') as orig_f2:
                    soup = BeautifulSoup(orig_f2.read())
                    for t in soup.find_all('td', class_='TEXT'):
                        t.string.wrap(soup.new_tag('h2'))

                # This is where you create your new modified file.
                    orig_f.write(soup.prettify().encode(soup.original_encoding))

CleanUpFolder('C:\Users\FOLDER')

我错过了什么？我不确定的主要是如何行

    for root, dirs, files in os.walk(do):

在这种背景下使用/理解了吗？

Answer 1

在这里，我将您的功能分成两个独立的功能，并清除冗余代码：

def clean_up_folder(dir):
    """Run the clean up process on dir, recursively."""
    for root, dirs, files in os.walk(dir):
        for f in files:
            print f.title()
            if not f.endswith('~'): #you don't want to process backups
                clean_up_file(os.path.join(root, f))

这解决了缩进问题，并且可以更轻松地测试函数并隔离任何未来的错误。我还删除了dirs上的循环，因为这将在walk内完成（并且意味着您将跳过任何files中没有dir的所有dirs ; t包含任何子{ - 1}}。

def clean_up_file(original_file):
    """Clean up the original_file."""      
    with open(original_file) as orig_f2:
        soup = BeautifulSoup(orig_f2.read())
    for t in soup.find_all('td', class_='TEXT'):
        t.string.wrap(soup.new_tag('h2'))
    with open(original_file, 'w') as orig_f:
        # This is where you create your new modified file.
        orig_f.write(soup.prettify().encode(soup.original_encoding))

请注意，我已将open的两个original_file分开，因此您在阅读之前不会意外覆盖它 - 无需将其打开以便同时进行读写

我没有安装BeautifulSoup，因此无法进一步测试，但这可以让您将问题缩小到特定文件。

使程序以Python递归调用自身

1 个答案: