Question

我想使用正则表达式过滤python中的字符串列表。在以下情况中，仅保留扩展名为“.npy”的文件。

不起作用的代码：

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.match, files)
print(selected_files)

同样的正则表达式在Ruby中适用于我：

selected = files.select { |f| f =~ /_x\d+_y\d+\.npy/ }

Python代码有什么问题？

Answer 1

selected_files = filter(regex.match, files)

re.match('regex')等于re.search('^regex')或text.startswith('regex')，但正则表达式版本。 仅检查字符串是否以正则表达式开头。

所以，请改用re.search()：

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.search, files)
print(selected_files)

输出：

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

如果您只想获取所有.npy个文件，只需使用str.endswith()：

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]


selected_files = filter(lambda x: x.endswith('.npy'), files)

print(selected_files)

Answer 2

只需使用search - 因为匹配从字符串的开头到结尾（即整个）开始匹配，搜索匹配字符串中的任何位置。

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.search, files)
print(selected_files)

输出 -

['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']

Answer 3

re.match()在字符串的开头查找匹配项。您可以改为使用re.search()。

Answer 4

如果您match，模式必须涵盖整个输入。要么扩展正则表达式：

regex = re.compile(r'.*_x\d+_y\d+\.npy')

哪个匹配：

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

或使用re.search，

扫描字符串，查找第一个位置，其中正则表达式模式产生匹配[...]

在列表中通过正则表达式过滤字符串

4 个答案: