无法解析HTTPConnection.debuglevel的输出

时间:2010-10-18 21:21:15

标签: python

我正在尝试编程检查tcp流的输出。我可以通过启用HTTPConnection中的debug来获取tcp流的结果,但是如何读取数据并使用正则表达式对其进行评估。我一直得到“TypeError:期望的字符串或缓冲区”。有没有办法将结果转换为字符串? 谢谢!

SCRIPT:

from urllib2 import Request, urlopen, URLError, HTTPError
import urllib2
import cookielib
import httplib
import re

httplib.HTTPConnection.debuglevel = 1 
p = re.compile('abc=..........')

cj = cookielib.CookieJar()
proxy_address = '192.168.232.134:8083' # change the IP:PORT, this one is for example
proxy_handler = urllib2.ProxyHandler({'http': proxy_address})
opener = urllib2.build_opener(proxy_handler, urllib2.HTTPCookieProcessor(cj), urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
url = "http://www.google.com/" # change the url
req=urllib2.Request(url)
data=urllib2.urlopen(req)
m=p.match(data)
if m:
    print "Match found."
else:
    print "Match not found."

结果:

send: 'GET hyperlink/ HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 303 See Other\r\n'
header: Location: hyperlink:8083/3240951276
header: Set-Cookie: abc=3240951276; path=/; domain=.google.com; expires=Thu, 31-Dec-2020 23:59:59 GMT
header: Content-Length: 0
send: 'GET hyperlink/3240951276 HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: hyperlink\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 303 See Other\r\n'
header: Location: hyperlink
header: Set-Cookie: abc=3240951276; path=/; expires=Thu, 31-Dec-2020 23:59:59 GMT
header: Content-Length: 0
send: 'GET http://www.google.com/ HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nCookie: abc=3240951276\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Mon, 18 Oct 2010 21:09:32 GMT
header: Expires: -1
header: cache-control: max-age=0, private, private
header: Content-Type: text/html; charset=ISO-8859-1
header: Set-Cookie: PREF=ID=066bc785a2b15ef6:FF=0:TM=1287436172:LM=1287436172:S=mNiXaRhshpf8nLji; expires=Wed, 17-Oct-2012 21:09:32 GMT; path=/; domain=.google.com
header: Set-Cookie: NID=39=ur3gnXL80kEy4shKAh8_-XV8PhmS4G83slPcX9OD3L6uthQZw-wq7RUnB0WKGYR3F_QGoyZAyEPCvjdi69EXXq23dzvpuZSl_KU2o7pqcTB7Vym4co1LOXmi9YQGpbkb; expires=Tue, 19-Apr-2011 21:09:32 GMT; path=/; domain=.google.com; HttpOnly
header: Server: gws
header: X-XSS-Protection: 1; mode=block
header: Connection: close
header: Content-Length: 4676
header: X-Con-Reuse: 1
header: Content-Encoding: gzip
header: via: 1.1 HermesPrefetch (CID2627003316.AID3240951276.TID1)
header: X-Trace-Timing: Start=1287436172845, Sched=0, Dns=2, Con=11, RxS=28, RxD=35
Traceback (most recent call last):
  File "C:\Documents and Settings\asdf\workspace\PythonScripts2\src\Test1.py", line 18, in <module>
    m=p.match(data)
TypeError: expected string or buffer

1 个答案:

答案 0 :(得分:0)

调试信息httplib为您提供了在终端中看到的实际上不是urllib2.urlopen()返回的对象的一部分。相反,它会直接打印到您的流程sys.stdout。遗憾的是,httplib无法改变此行为。通过“捕获”此输出并在其上运行正则表达式,我并不完全清楚您要实现的目标,但如果确实您要执行的操作,则需要替换{ {1}}使用其他内容,例如合适的sys.stdout对象,并以某种方式查看哪个输出是您关心的输出。

但是,请记住,StringIO在其调试输出中生成的所有信息都可以直接在您的程序中使用。它基于您传递给httplib(通过httplib)的内容,或者它是服务器响应的一部分,因此可以在urllib2返回的对象中使用。例如,您似乎正在尝试提取Cookie信息,只需从您已提供的urllib2.urlopen()中获取Cookie即可获得该信息。似乎没有任何明智的理由尝试捕获输出并解析它。