使用python re模块从页面获取视图状态的正则表达式

时间:2012-06-01 10:52:09

标签: python regex

我有一个这样的字符串:

  

| 0 | hiddenField | _ EVENTTARGET || 0 |隐| _EVENTARGUMENT || 0 | hiddenField | _ LASTFOCUS || 3848 | hiddenField | _VIEWSTATE | S + waYnNoWvsZV2X3xG1WKKtdIXQAqyoTV2oVgE / oyaJDgtLOoHSohEYPdTK7NsrM64VCXtyQQ23FOfUzWFNso8FQlu2JSomP7JGAoxpg2RzFfOWjOMaDDo / iKpjDiAjVMUYHdPvKSzTXpsPmqg49y4yEMfViFOowPSoTrojZAIWfAH0DvqvBHOFDozoDfk283k + O5JEXiyJKHXQ6p0IjFZOjr + tIckyRPPH8vJkCo7xntoQ + ZV8KlzrBfJnRthm1XOkkxX73DTW0mByIbATIAdVUwNWQ4lrapOaaxh5y7AjoxlcpyyG2rkezkKquaZyf0kZg5 + Yd9HnUmXUY5VsEj2NYIyuppVoYuFQPeoRXbC / SQV6m9Gf ++ VYhm + sVo8sx8Dvoitelm3R617 / zEi71VIrJlk51BH6DnWwaWoHH6gygSHslVwP + iFTao + LR5fekfjAf + BeTgBshc8BVGwslQxJ + YBmyttQxSddb8WyqEGHX2Wc6XqXCKSA8XEgad / 42lPRknrLtCLM1b3sn7xWQxUUnb2pEDRc96C + tNUPAy3CZPS2Uq / aCsJqQTp9EssnMhKvfUJTplEbxd9xX2KfeTRa51cDZGzQbBk3L8steG83ehGm1hsU42hdx / 3GICA1eKsDmFKpA0D2 / NWpnG4rYWJK + MhzbnveqbX0Cak6VyEOLcmaD0dhYz9kOhOhc7h3ntWcbE40qbKhhKTE8Yq9voAqRGFT2AuGTThtbGfQ2GYoua8Oz8pPSgkGYsOcU6dI0vtoEdeH9rUC3a2vLLigVXeQ2bCbAFIkzrpHSfHsp9TLE1AoX3E57 // 23ZcwzDTJiPYottJGwxn3cnenh8xOdcoQM + 7qkDiaD7CVUrvN 9p8dmtQjtYHNbt7D8m / SZjvA / SmmAfIKMA == | 0 | hiddenField |

我需要从这个字符串中获取__VIEWSTATE的值:

  

S + waYnNoWvsZV2X3xG1WKKtdIXQAqyoTV2oVgE / oyaJDgtLOoHSohEYPdTK7NsrM64VCXtyQQ23FOfUzWFNso8FQlu2JSomP7JGAoxpg2RzFfOWjOMaDDo / iKpjDiAjVMUYHdPvKSzTXpsPmqg49y4yEMfViFOowPSoTrojZAIWfAH0DvqvBHOFDozoDfk283k + O5JEXiyJKHXQ6p0IjFZOjr + tIckyRPPH8vJkCo7xntoQ + ZV8KlzrBfJnRthm1XOkkxX73DTW0mByIbATIAdVUwNWQ4lrapOaaxh5y7AjoxlcpyyG2rkezkKquaZyf0kZg5 + Yd9HnUmXUY5VsEj2NYIyuppVoYuFQPeoRXbC / SQV6m9Gf ++ VYhm + sVo8sx8Dvoitelm3R617 / zEi71VIrJlk51BH6DnWwaWoHH6gygSHslVwP + iFTao + LR5fekfjAf + BeTgBshc8BVGwslQxJ + YBmyttQxSddb8WyqEGHX2Wc6XqXCKSA8XEgad / 42lPRknrLtCLM1b3sn7xWQxUUnb2pEDRc96C + tNUPAy3CZPS2Uq / aCsJqQTp9EssnMhKvfUJTplEbxd9xX2KfeTRa51cDZGzQbBk3L8steG83ehGm1hsU42hdx / 3GICA1eKsDmFKpA0D2 / NWpnG4rYWJK + MhzbnveqbX0Cak6VyEOLcmaD0dhYz9kOhOhc7h3ntWcbE40qbKhhKTE8Yq9voAqRGFT2AuGTThtbGfQ2GYoua8Oz8pPSgkGYsOcU6dI0vtoEdeH9rUC3a2vLLigVXeQ2bCbAFIkzrpHSfHsp9TLE1AoX3E57 // 23ZcwzDTJiPYottJGwxn3cnenh8xOdcoQM + 7qkDiaD7CVUrvN9p8dmtQjtYHNbt7D8m / SZjvA / SmmAfIKMA ==

我使用re模块尝试了几种模式,但没有一种模式可行。有人可以帮助我吗?

2 个答案:

答案 0 :(得分:2)

这有效:

_VIEWSTATE\|([^|]*)

演示:http://rubular.com/r/JoFyUu5NsC

以及 @dbaupp 的补充:

(?:^|\|)_VIEWSTATE\|([^|]*)

http://rubular.com/r/HmnapACGEw

答案 1 :(得分:1)

这个正则表达式做到了这个

_VIEWSTATE\|([^|"]*)

它将视图状态存储在组1中。要使用的Python代码是

reobj = re.compile(r"_VIEWSTATE\|([^|"]*)")
match = reobj.search(subject)
if match:
    result = match.group(1)
else:
    result = ""

阅读你的评论我认为“结束分隔符”也可能是对的吗?你正在处理ASP.NET视图状态。