Question

我想实现一个文件阅读器（文件夹和子文件夹）脚本，它可以检测某些标签并从文件中删除这些标签。

文件是.cpp，.h .txt和.xml并且它们是同一文件夹下的数百个文件。

我不知道python，但人们告诉我，我可以轻松地做到。

示例：

我的主文件夹是A：C：\ A

在A里面，我有文件夹（B，C，D）和一些文件A.cpp A.h A.txt和A.xml。在B我有文件夹B1，B2，B3，其中一些有更多的子文件夹，文件.cpp，.xml和.h ....

xml文件，包含一些标签，例如
.h和.cpp文件包含其他类型的标记，例如//$TAG some text$
.txt包含不同的格式标记：#$This is my tag$

它总是以$符号开头和结尾但它总是有一个注释字符（//，

想法是运行一个脚本并删除所有文件中的所有标记，以便脚本必须：

读取文件夹和子文件夹
打开文件并找到标签
如果他们在那里，请删除并保存带有更改的文件

我有什么：

import  os

for root, dirs, files in os.walk(os.curdir):

 if files.endswith('.cpp'):
  %Find //$ and delete until next $
 if files.endswith('.h'):
  %Find //$ and delete until next $
 if files.endswith('.txt'):
  %Find #$ and delete until next $
 if files.endswith('.xml'):
  %Find <!-- $ and delete until next $ and -->

Answer 1

一般的解决方案是：

使用os.walk()函数遍历目录树。
迭代文件名并使用fn_name.endswith('.cpp')和if / elseif来确定您正在使用哪个文件
使用re模块创建可用于确定某行是否包含标记的正则表达式
打开目标文件和临时文件（使用tempfile模块）。逐行迭代源文件，并将过滤后的行输出到tempfile。
如果替换了任何行，请使用os.unlink()加os.rename()替换原始文件

对于熟练的Python来说，这是一个微不足道的练习，但对于那些熟悉该语言的人来说，它可能需要几个小时才能开始工作。你可能不会要求更好的任务来介绍这门语言。祝你好运！

-----更新-----

os.walk返回的files属性是一个列表，因此您还需要迭代它。此外，files属性仅包含文件的基本名称。您需要将root值与os.path.join()结合使用，才能将其转换为完整路径名。试着这样做：

for root, d, files in os.walk('.'): 
    for base_filename in files: 
        full_name = os.path.join(root, base_filename)
        if full_name.endswith('.h'):
            print full_name, 'is a header!'
        elif full_name.endswith('.cpp'):
            print full_name, 'is a C++ source file!'

如果您使用的是Python 3，那么print语句需要是函数调用，但总体思路仍然相同。

Answer 2

尝试这样的事情：

import os
import re

CPP_TAG_RE = re.compile(r'(?<=// *)\$[^$]+\$')

tag_REs = {
    '.h': CPP_TAG_RE,
    '.cpp': CPP_TAG_RE,
    '.xml': re.compile(r'(?<=<!-- *)\$[^$]+\$(?= *-->)'),
    '.txt': re.compile(r'(?<=# *)\$[^$]+\$'),
}

def process_file(filename, regex):
    # Set up.
    tempfilename = filename + '.tmp'
    infile = open(filename, 'r')
    outfile = open(tempfilename, 'w')

    # Filter the file.
    for line in infile:
        outfile.write(regex.sub("", line))

    # Clean up.
    infile.close()
    outfile.close()

    # Enable only one of the two following lines.
    os.rename(filename, filename + '.orig')
    #os.remove(filename)

    os.rename(tempfilename, filename)

def process_tree(starting_point=os.curdir):
    for root, d, files in os.walk(starting_point): 
        for filename in files:
            # Get rid of `.lower()` in the following if case matters.
            ext = os.path.splitext(filename)[1].lower()
            if ext in tag_REs:
                process_file(os.path.join(root, base_filename), tag_REs[ext])

关于os.splitext的好处是它对于以.开头的文件名是正确的。

如何在python中编写标记删除脚本

2 个答案: