Question

我有两个slu ,,但我只捕获一个。但是我很难排除第二个只添加了#加上一些文字。

以下是两个slu ::

slugs = ['/sub/12345678', '/sub/12345678#is']

以下是我尝试使用Python的re：

cleaned_slugs = []
for i in slugs:
    slug_check = re.match('/sub/[0-9]{8}[^#]', i).group(0)
    cleaned_slug.append(slug_check)

当我在Pythex上试用这个正则表达式时，它只选择第一个slug。

我出错了什么？

顺便说一句：我知道for循环不是最优雅的方式。我感谢任何简短的回答......

Answer 1

您可以检查字符串是否在数字后结束：

>>> import re
>>> pattern = re.compile('/sub/(\d+)$')
>>> slugs = ['/sub/12345678', '/sub/12345678#is']
>>> for slug in slugs:
...    match = pattern.search(slug)
...    if match:
...        print match.group(1)
... 
12345678

此处$匹配字符串的结尾。

仅供参考，我故意使用\d+代替[0-9]{8}，因为我怀疑你真的需要检查8位数，因为它是一个slu ..如果您想这样做，只需将\d+替换为[0-9]{8}。

另外，请检查此主题以获取获取捕获组的更短方法：Getting captured group in one line。

Answer 2

这个怎么样？

 print [s for s in slugs if '#' not in s]

或等同地

 print filter(lambda s: '#' not in s, slugs)

Answer 3

如果你想要包括sub，只有没有＆＃34;＃＆＃34;：

的那个

slugs = ['/sub/12345678', '/sub/12345678#is']
cleaned_slugs = []
for i in slugs:
    patt= re.search(r'/sub/[0-9]{8}$', i)
    if patt:
        cleaned_slugs.append(patt.group()) 
cleaned_slugs
['/sub/12345678']

Answer 4

如你所说for完全没必要，只需filter

reg = re.compile('/sub/(\d+)$')
slugs = ['/sub/12345678', '/sub/12345678#is']
cleaned_slug = filter( lambda s: reg.match(s), slugs )

使用哪个正则表达式来排除字符串中的某些结尾？

4 个答案: