在列表中通过正则表达式过滤字符串

时间:2015-12-06 13:30:23

标签: python regex

我想使用正则表达式过滤python中的字符串列表。在以下情况中,仅保留扩展名为“.npy”的文件。

不起作用的代码:

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.match, files)
print(selected_files)

同样的正则表达式在Ruby中适用于我:

selected = files.select { |f| f =~ /_x\d+_y\d+\.npy/ }

Python代码有什么问题?

4 个答案:

答案 0 :(得分:30)

selected_files = filter(regex.match, files)

re.match('regex')等于re.search('^regex')text.startswith('regex'),但正则表达式版本。 仅检查字符串是否以正则表达式开头。

所以,请改用re.search()

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.search, files)
print(selected_files)

输出:

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

如果您只想获取所有.npy个文件,只需使用str.endswith()

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]


selected_files = filter(lambda x: x.endswith('.npy'), files)

print(selected_files)

答案 1 :(得分:4)

只需使用search - 因为匹配从字符串的开头到结尾(即整个)开始匹配,搜索匹配字符串中的任何位置。

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.search, files)
print(selected_files)

输出 -

['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']

答案 2 :(得分:2)

re.match()在字符串的开头查找匹配项。您可以改为使用re.search()

答案 3 :(得分:1)

如果您match,模式必须涵盖整个输入。 要么扩展正则表达式:

regex = re.compile(r'.*_x\d+_y\d+\.npy')

哪个匹配:

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

或使用re.search

  

扫描字符串,查找第一个位置,其中正则表达式模式产生匹配[...]