Question

我有这个正则表达式：

regex_ = r'(\w+\s+RN).*?(\w+\s+VA\w+).*?(\w+\s+VMP\w+)'

我想将它应用到一个包含txt文件的文件夹中，并将每个文档作为列表和新行返回。像这样的东西：

[pattern of the regex 1]
[pattern of the regex 2]
...
[pattern of the regex n]
[pattern of the regex n-1]

所以这就是我的尝试：

directory_ = '/Users/user/path/folder_txts/'
regex_ = r'(\w+\s+RN).*?(\w+\s+VA\w+).*?(\w+\s+VMP\w+)'

def retrive(directory, a_regex):
    for filename in glob.glob(os.path.join(directory, '*.txt')):
        with open(filename, 'r') as file:
            important_stuff = re.findall(a_regex, file.read())
            my_list = [tuple([j.split()[0] for j in i]) for i in important_stuff]
            print my_list

这是输出：

print retrive(directory_, regex_)
['']
['']
...
['']

由于输出应如下所示，这是错误的：

[('string', 'string', 'string'), ('string', 'string', 'string')]
[('string', 'string', 'string'), ('string', 'string', 'string')]
...
[('string', 'string', 'string'), ('string', 'string', 'string')]

如何将上述正则表达式应用于目录的整个txt文件，并将其作为按名称文件名称按字母顺序排序的列表返回？ This是一个txt文件的示例。

Answer 1

你的正则表达式出了问题。

请提供

regex_ = r'(\w+\s+RN).*?(\w+\s+VA\w+).*?(\w+\s+VM\w+)'

而不是

regex_ = r'(\w+\s+RN).*?(\w+\s+VA\w+).*?(\w+\s+VMP\w+)'

和功能

important_stuff = re.findall(a_regex, file.read(), re.S)

如何将正则表达式应用于充满.txt文件的文件夹？

1 个答案: