Question

我有一个如下所示的列表：

list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']

我只想要约会。我有一个看起来像这样的正则表达式：

r'\b(\d+/\d+/\d{4})\b'

但我真的不知道如何在列表中使用它。或者也许可以用其他方式完成

任何帮助都将非常感激

Answer 1

很简单。只需使用re.match：

>>> import re
>>> mylist = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
>>> dates = [x for x in mylist if re.match(r'\b(\d+/\d+/\d{4})\b', x)]
>>> dates
['1/4/2015', '1/4/2015', '1/4/2015']

re.match只匹配字符串的开头，所以它就是你想要的情况。另外，我不会列出一个列表＆＃34; list＆＃34; - 因为那是内置列表类的名称，如果你尝试list(some_iterable)，你可能会在以后伤害自己。最好不要养成那种习惯。

最后，您的正则表达式将匹配以日期开头的字符串。如果您想确保整个字符串是您的日期，您可以稍微修改为r'(\d{1,2}/\d{1,2}/\d{4})$' - 这将确保月份和日期分别为1或2位数，而年份正好是4位数。

Answer 2

如果列表很长，首先编译模式会带来更好的性能

import re

# list is a keyword in Python, so when used as a variable name, append
# underscore, according to PEP8 (https://www.python.org/dev/peps/pep-0008/)
# quote: single_trailing_underscore_ : used by convention to avoid conflicts
# with Python keyword, e.g.
list_ = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']

date_pattern = re.compile(r'\b(\d+/\d+/\d{4})\b')

print filter(date_pattern.match, list_)
# equivalent to
# print [i for i in list_ if date_pattern.match(i)]
# produces ['1/4/2015', '1/4/2015', '1/4/2015']

Answer 3

您可以使用re.match()来实现此目的。

注意：list是Python中的保留关键字。你不应该使用它。

import re
str_list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']

# Using list(str_list) to iterate over the copy of 'str_list'
# to remove unmatched strings from the original list
for s in list(str_list):
    if not re.match(r'\b(\d+/\d+/\d{4})\b', s):
        str_list.remove(s)

或者，如果您还想保留原始列表，也可以使用列表理解：

import re
str_list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
new_list = [s for s in str_list if re.match(r'\b(\d+/\d+/\d{4})\b', s)]

从列表中获取特定字符串 - Python

3 个答案: