Question

我正在使用Python并尝试使用正则表达式从文件列表中获取XML文件，但直到现在我才使用正则表达式。

假设我有一个文件列表：

files = ['.bash_logout', '20120910NYP.xml', '.bash_profile', '.bashrc', '.mozilla', 'testfile_248.xml']

现在我需要获取格式20120910NYP.xml的文件，所以我决定写一个正则表达式：

import re
feedRegex = # ?
feedFiles = filter((lambda x: re.search(feedRegEx, x) != None), files)

在上面的代码中，我如何为feedRegex编写正则表达式，以便从列表中找到该格式的XML文件？

编辑代码：

每当我需要此功能时，需要为此函数提供list of files和feedregex代码

import re

def paramikoFetchLatestFeedFile(list_of_files, feedRegEx):

    self.files = list_of_files
    self.feedRegEx = feedRegEx

    feedFiles = filter((lambda x: re.search(self.feedRegEx, x) != None), self.files)

Answer 1

files = [...]
xml_files = [fn for fn in files if fn.endswith('.xml')]

Answer 2

使用glob为您进行过滤。

假设您有此目录：

burhan@sandbox:~/t$ ls -l
total 0
-rw-r--r-- 1 burhan burhan 0 Sep 11 09:17 20120101NYP.xml
-rw-r--r-- 1 burhan burhan 0 Sep 11 09:08 20120819ABC.xml
-rw-r--r-- 1 burhan burhan 0 Sep 11 09:09 ABC10234ABC.xml
-rw-r--r-- 1 burhan burhan 0 Sep 11 09:15 bar.txt
-rw-r--r-- 1 burhan burhan 0 Sep 11 09:15 blablah.gif
-rw-r--r-- 1 burhan burhan 0 Sep 11 09:15 foo.txt
-rw-r--r-- 1 burhan burhan 0 Sep 11 09:15 hello.jpg

以下是过滤它的方法：

>>> import glob
>>> glob.glob("[0-9]*NYP.xml")
['20120101NYP.xml']

根据您的具体要求：

>>> import re
>>> file_list = ['20121011NYP.xml','foo.bar','zoo.txt','ABC1234.xml','20120101ABC.XML']
>>> exp = re.compile('^\d{8}NYP\.xml$', re.I)
>>> filtered_list = [x for x in file_list if re.match(exp,x)]
>>> filtered_list
['20121011NYP.xml']

Answer 3

显然你想要像

这样的东西

regex = re.compile('^\d{8}.NYP.xml$')

请阅读正则表达式文档。这是真正的正则表达式基础。

如何查找具有特定名称的XML文件

3 个答案: