我想使用正则表达式过滤python中的字符串列表。在以下情况中,仅保留扩展名为“.npy”的文件。
不起作用的代码:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_x\d+_y\d+\.npy')
selected_files = filter(regex.match, files)
print(selected_files)
同样的正则表达式在Ruby中适用于我:
selected = files.select { |f| f =~ /_x\d+_y\d+\.npy/ }
Python代码有什么问题?
答案 0 :(得分:30)
selected_files = filter(regex.match, files)
re.match('regex')
等于re.search('^regex')
或text.startswith('regex')
,但正则表达式版本。 仅检查字符串是否以正则表达式开头。
所以,请改用re.search()
:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_x\d+_y\d+\.npy')
selected_files = filter(regex.search, files)
print(selected_files)
输出:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
如果您只想获取所有.npy
个文件,只需使用str.endswith()
:
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
selected_files = filter(lambda x: x.endswith('.npy'), files)
print(selected_files)
答案 1 :(得分:4)
只需使用search
- 因为匹配从字符串的开头到结尾(即整个)开始匹配,搜索匹配字符串中的任何位置。
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_x\d+_y\d+\.npy')
selected_files = filter(regex.search, files)
print(selected_files)
输出 -
['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']
答案 2 :(得分:2)
re.match()
在字符串的开头查找匹配项。您可以改为使用re.search()
。
答案 3 :(得分:1)