Question

我有一个目录，其中包含日期字符串作为文件名的一部分的文件：

file_type_1_20140722_foo.txt
file_type_two_20140723_bar.txt
filetypethree20140724qux.txt

我需要从文件名中获取这些日期字符串并将它们保存在一个数组中：

['20140722', '20140723', '20140724']

但是它们可以出现在文件名中的不同位置，所以我不能只使用子字符串表示法并直接提取它。在过去，我在Bash中做过类似事情的方式是这样的：

date=$(echo $file | egrep -o '[[:digit:]]{8}' | head -n1)

但是I can't use Bash for this because it sucks at math（我需要能够加上和减去浮点数）。我尝试过glob.glob()和re.match()，但都返回空集：

>>> dates = [file for file in sorted(os.listdir('.')) if re.match("[0-9]{8}", file)]
>>> print dates
>>> []

我知道问题是它正在寻找八位数字的完整文件名，但我不知道如何让它寻找子串。有什么想法吗？

Answer 1

>>> import re
>>> import os
>>> [date for file in os.listdir('.') for date in re.findall("(\d{8})", file)]
['20140722', '20140723']

请注意，如果文件名具有9位子字符串，则只匹配前8位数字。如果文件名包含16位子字符串，则会有2个非重叠匹配。

Answer 2

re.match匹配字符串的开头。 re.search匹配任何地方的模式。或者你可以试试这个：

extract_dates = re.compile("[0-9]{8}").findall
dates = [dates[0] for dates in sorted(
    extract_dates(filename) for filename in os.listdir('.')) if dates]

Answer 3

你的正则表达式看起来不错，但你应该使用re.search而不是re.match，以便它在字符串中的任何地方搜索该表达式：

import re
r = re.compile("[0-9]{8}")
m = r.search(filename)
if m:
    print m.group(0)

在Python中从文件名中提取子字符串？

3 个答案: