Question

我正在尝试使用正则表达式找到具有某种命名约定的文本文件，但到目前为止还没有成功。

命名约定为file_[year]-[month]-[day].txt（例如file_2010-09-15.txt）。

以下是我目前的情况：^(file_)[0-9]{4}[-][0-9]{2}[-][0-9]{2}(\.txt)$

我正试图在我的代码中使用它：

    for text_file in os.listdir(path):
        if fnmatch.fnmatch(text_file, '^(file_)[0-9]{4}[-][0-9]{2}[-][0-9]{2}(\.txt)$'):
            # print number of files found

Answer 1

我认为问题是因为pattern所期望的fnmatch类型。在文件中，它陈述如下：

此模块提供对Unix shell样式通配符的支持，这些通配符与正则表达式（在re模块中记录）不同。 shell样式通配符中使用的特殊字符是：

Pattern Meaning
*   matches everything
?   matches any single character
[seq]   matches any character in seq
[!seq]  matches any character not in seq

` 你可以保持原样，只需改变它的支持方式，即：

for text_file in os.listdir(path):
    if fnmatch.fnmatch(text_file, 'file_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].txt'):
        # print number of files found

或者我建议使用re.match如此：

regex = re.compile(r'^(file_)[0-9]{4}[-][0-9]{2}[-][0-9]{2}(\.txt)$')
for text_file in os.listdir(path):
    if regex.match(text_file):
        # print the text file

Answer 2

fnmatch将regex转换为re python模块。看一下源代码here。基本上，支持的快捷方式是：

Patterns are Unix shell style:
*       matches everything
?       matches any single character
[seq]   matches any character in seq
[!seq]  matches any char not in seq

你的正则表达式应该是：'file_[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].txt'

或者，您可以直接使用re，而不使用fnmatch（以下面的代码作为起点，但仍有改进的余地：检查一年是否有效，一个月在1-12之间，一天在1到28,29,30或31之间：

import re

example_file = 'file_2010-09-15.txt'

myregex = 'file_\d\d\d\d-\d\d-\d\d\.txt'

result = re.match(myregex, example_file)

print(result.group(0))

匹配文本文件正则表达式的命名约定

2 个答案: