python正则表达式如何避免匹配多个分号?

时间:2015-03-02 22:56:11

标签: regex logstash

我要写一个正则表达式来提取子串。字符串是:

  

ASP.NET_SessionId=frffcjcarie4dhxouz5yklwu;+BIGipServercapitaliq-ssl=3617221783.36895.0000;+ObSSOCookie=wkyQfn2Cyx2%2f7kSj4zBB886WaLs92Ord9FSf64c%2byHFOBwgEP4f3UmorDj051suQwRXAKEwBtYVKRYJuUGh2YNZtAj2%2bNp8asLIT9xQPqVktEAzkl3jNIv8MyWFsoFPDtm%2fTm1FeaCP%2bGTk9Oa%2fCNA0Hmy847qK2qo7%2bbziV%2bjeClbkGjAX3pgcPzfs%2bQp7p9BSjP1xJqUaUKwJ2%2flIgzZL5Ma%2bnJK8j%2b732ixNyIDNDGo7uIF%2b;+machineIdCookie=866873600;+userLoggedIn=jga;sdgjefdfdfs

我想提取一个以ObSSOCookie=....;开头并在userLoggedIn之前结束的子字符串。

我设置了我的正则表达式

pattern = "ObSSOCookie=.*;" 

但它继续提取,直到最后分号(包括+machineIdCookie=866873600),而不是第一个分号,这就是我想要的。< / p>

有没有办法提取到第一个分号?我不能只使用split“;”因为这个正则表达式实际上是用在Logstash配置文件中,并且没有办法在那里使用python样式的编码......

2 个答案:

答案 0 :(得分:1)

你想让你的正则表达式非贪婪

而不是使用此

*  - zero or more

使用此

*? - zero or more (non-greedy)

这是你的表达式(demo)。

ObSSOCookie=(.*?;)

这是一项通用技术,也在this answer中进行了描述。

答案 1 :(得分:0)

为什么不抓住下一个;之外的任何内容(demo

 ObSSOCookie=([^;]*)


>>> import re
>>> data = 'ASP.NET_SessionId=frffcjcarie4dhxouz5yklwu;+BIGipServercapitaliq-ssl=3617221783.36895.0000;+ObSSOCookie=wkyQfn2Cyx2%2f7kSj4zBB886WaLs92Ord9FSf64c%2byHFOBwgEP4f3UmorDj051suQwRXAKEwBtYVKRYJuUGh2YNZtAj2%2bNp8asLIT9xQPqVktEAzkl3jNIv8MyWFsoFPDtm%2fTm1FeaCP%2bGTk9Oa%2fCNA0Hmy847qK2qo7%2bbziV%2bjeClbkGjAX3pgcPzfs%2bQp7p9BSjP1xJqUaUKwJ2%2flIgzZL5Ma%2bnJK8j%2b732ixNyIDNDGo7uIF%2b;+machineIdCookie=866873600;+userLoggedIn=jga;sdgjefdfdfs'
>>> p = re.compile('ObSSOCookie=([^;]*)')
>>> m = p.search(data)
>>> m.group(1)
'wkyQfn2Cyx2%2f7kSj4zBB886WaLs92Ord9FSf64c%2byHFOBwgEP4f3UmorDj051suQwRXAKEwBtYVKRYJuUGh2YNZtAj2%2bNp8asLIT9xQPqVktEAzkl3jNIv8MyWFsoFPDtm%2fTm1FeaCP%2bGTk9Oa%2fCNA0Hmy847qK2qo7%2bbziV%2bjeClbkGjAX3pgcPzfs%2bQp7p9BSjP1xJqUaUKwJ2%2flIgzZL5Ma%2bnJK8j%2b732ixNyIDNDGo7uIF%2b'