Question

我有一个包含unicode字符串的Python列表：

mylist = [
  u'Path:path\\to\\some\\file.html\n   user ID: a.b.c\n',
  u'Path:somewhat\\longer\\path\\to\\some\\file.jpeg\n   user ID: a:b_c\n    someotherID:x:x:x\n'
]

我只需要提取Path的最后一部分：在本例中为file.html和file.jpeg。是否有任何全面的正则表达式可以从我的列表中获取此信息？

Answer 1

如果您使用ntpath代替os.path，则可以获得正确的行为而不会产生不稳定的正则表达式：

>>> import ntpath
>>> [ntpath.basename(entry.split('\n')[0]) for entry in mylist]
[u'file.html', u'file.jpeg']

与@Kasra所说的相反，您的路径是有效的，它们只来自另一个操作系统。

Answer 2

您不需要正则表达式，您可以使用os.path，但首先需要将\替换为/，然后使用path.basename：

>>> from os import path
>>> [path.basename(i.split()[0].replace('\\','/')) for i in mylist if i]
[u'file.html', u'file.jpeg']

Answer 3

for path in path_list:
    # assuming each item in the list actually contains a path
    print re.search(r'Path:(?:.*?\\)(\w+\.\w+)', path).group(1)

正则表达式匹配列表中的不同字符串

3 个答案: