Python从字符串URL中提取会话ID

时间:2018-10-03 04:34:20

标签: python regex

我有一个网址:

  

mywebsite.com/idp/profile/SAML2/Redirect/SSO;jsessionid=CED11D31669BEAB45B4CDA651C7EBF3B.idp03?execution=e1s1

我想提取分号之后但.idp03之前的jsessionid值:CED11D31669BEAB45B4CDA651C7EBF3B

我该如何使用Python?

3 个答案:

答案 0 :(得分:2)

import re

s = '/idp/profile/SAML2/Redirect/SSO;jsessionid=CED11D31669BEAB45B4CDA651C7EBF3B.idp03?execution=e1s1'

re.findall('jsessionid=(.*)\.',s)
# ['CED11D31669BEAB45B4CDA651C7EBF3B']

答案 1 :(得分:1)

在这里我将使用更为谨慎的模式,该模式将检查jsessionid的终止条件之一:

  • 。如果有扩展名
  • ?如果没有扩展名.idp03
  • $(如果没有扩展名且没有查询参数)

将其放在一起,我们有这个:

input = '/idp/profile/SAML2/Redirect/SSO;jsessionid=CED11D31669BEAB45B4CDA651C7EBF3B.idp03?execution=e1s1'
result = re.search(r'jsessionid=(.*?)(?=[.?]|$)', input)

if result:
    print "jsessionid : ", result.group(1)
else:
    print "no jsessionid found"

答案 2 :(得分:0)

  1. (?<= jsessionid =)开始与jsessionid=

  2. 匹配
  3. \ w +匹配任何单词字符(字母,数字)

代码:

import re
s = "mywebsite.com/idp/profile/SAML2/Redirect/SSO;jsessionid=CED11D31669BEAB45B4CDA651C7EBF3B.idp03?execution=e1s1"
print(re.findall(r"(?<=jsessionid=)\w+",s)) # ['CED11D31669BEAB45B4CDA651C7EBF3B']