Python re(regex)匹配包含字母,连字符,数字的特定字符串

时间:2016-12-11 01:21:34

标签: python regex python-2.7

我正在尝试使用python正则表达式包重新匹配python 2.7中的以下字符串,并且无法提供正则表达式代码:

https://www.this.com/john-smith/e5609239
https://www.this.com/jane-johnson/e426609216
https://www.this.com/wendy-saad/e172645609215
https://www.this.com/nick-madison/e7265609214
https://www.this.com/tom-taylor/e17265709211
https://www.this.com/james-bates/e9212

所以前缀是固定的" https://www.this.com/"然后有一个可变数量的小写字母,然后" - ",然后" e",然后是可变数字的数字。

这是我试图无效的:

href=re.compile("https://www.this.com/people-search/[a-z]+[\-](?P<firstNumBlock>\d+)/")

href=re.compile("https://www.this.com/people-search/[a-z][\-][a-z]+/e[0-9]+")

感谢您的帮助!

4 个答案:

答案 0 :(得分:1)

href=re.compile("https://www\.mylife\.com/people-search/[a-z]+-[a-z]+/e[0-9]+")

Try out here.

答案 1 :(得分:1)

您遇到了转义特殊字符的问题。由于您没有使用原始字符串,因此反斜杠在您的字符串文字本身中具有特殊含义。此外,字符类(带includeParents: -1)不需要在正则表达式中进行转义。您可以按如下方式简化表达式:

[]

使用以下数据:

expression = r"https://www.mylife.com/people-search/[a-z]+-[a-z]+/e\d+"

结果:

strings = ['https://www.mylife.com/people-search/john-smith/e5609239',
 'https://www.this.com/people-search/jane-johnson/e426609216',
 'https://www.this.com/people-search/wendy-saad/e172645609215',
 'https://www.this.com/people-search/nick-madison/e7265609214',
 'https://www.this.com/people-search/tom-taylor/e17265709211',
 'https://www.this.com/people-search/james-bates/e9212']

答案 2 :(得分:1)

re.compile(r'https://www.this.com/[a-z-]+/e\d+')

[a-z-]+匹配john-smith e\d+匹配e5609239

答案 3 :(得分:1)

text = '''https://www.this.com/john-smith/e5609239
https://www.this.com/jane-johnson/e426609216
https://www.this.com/wendy-saad/e172645609215
https://www.this.com/nick-madison/e7265609214
https://www.this.com/tom-taylor/e17265709211
https://www.this.com/james-bates/e9212'''
href = re.compile(r'https://www\.this\.com/[a-zA-Z]+\-[a-zA-Z]+/e[0-9]+')
m = href.findall(text)
pprint(m)

输出:

['https://www.this.com/john-smith/e5609239',
'https://www.this.com/jane-johnson/e426609216',
'https://www.this.com/wendy-saad/e172645609215',
'https://www.this.com/nick-madison/e7265609214',
'https://www.this.com/tom-taylor/e17265709211',
'https://www.this.com/james-bates/e9212']