正则表达式捕获html隐藏输入

时间:2013-09-30 20:49:36

标签: python html regex

我试图捕获joomla令牌whit python pycurl, 我写这个功能代码:

import urllib, urllib2, sys, re
import cStringIO
import pycurl

def CaptureToken(cURL):
    buf = cStringIO.StringIO()
    c = pycurl.Curl()
    c.setopt(c.URL, cURL)
    c.setopt(c.WRITEFUNCTION, buf.write)
    c.setopt(c.CONNECTTIMEOUT, 30)
    c.setopt(c.TIMEOUT, 30)
    c.perform()
    html = buf.getvalue()
    buf.close()
    results = re.match(r"(type=\"hidden\" name=\"([0-9a-f]{32})\")", html).group(1)
    print results

CaptureToken('http://www.proregionisbono.org.pl/administrator/index.php')

在记事本++这个正则表达式工作,在python不工作:(,请有人帮助我。

1 个答案:

答案 0 :(得分:3)

re.match匹配字符串的开头,您可能希望re.search匹配字符串中的任何位置。

Python docs

此版本的代码适用于我:

import urllib, urllib2, sys, re
import cStringIO
import pycurl

def CaptureToken(cURL):
    buf = cStringIO.StringIO()
    c = pycurl.Curl()
    c.setopt(c.URL, cURL)
    c.setopt(c.WRITEFUNCTION, buf.write)
    c.setopt(c.CONNECTTIMEOUT, 30) 
    c.setopt(c.TIMEOUT, 30) 
    c.perform()
    html = buf.getvalue()
    buf.close()
    results = re.search(r'(type="hidden" name="([0-9a-f]{32})")', html).group(2)
    print results

CaptureToken('http://www.proregionisbono.org.pl/administrator/index.php')