捕获所有实时HTTP标头数据

时间:2015-07-10 01:23:01

标签: python http-headers python-requests

我没有看到使用Requests模块可以实现这一点,但也许我错过了一些东西......

我需要能够捕获所有实时HTTP标头数据,例如,Firefox插件,例如创造性地命名为Live HTTP Headers plugin

是否有捕获标题数据的方法,以便我可以收集以下内容(或尽可能接近以下内容)?:

https://instagram.com/oauth/authorize/?client_id=cb0096f08a3848e6a355f&redirect_uri=https://pythondev.geometryfletch.com/instagramredirect.html&response_type=code&hl=hu

GET /oauth/authorize/?client_id=cb0096f08a3848e6a355f&redirect_uri=https://pythondev.geometryfletch.com/instagramredirect.html&response_type=code&hl=hu HTTP/1.1
Host: instagram.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:38.0) Gecko/20100101 Firefox/38.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: csrftoken=4d9696d270a1d2d7b4d1b5; mid=U8lMswAEAAGyEMGTjENK; __utma=227057989.1190820776.1417498356.1417498356.1417498356.1; sessionid=IGSCb5786690876faa5d2505e1d8b3782691614164cb344c52ec2a6714cb5e1cd884%3Akds8RALygAnGbeQMAiLU%3A%7B%22_token_ver%22%3A1%2C%22_auth_user_id%22%3A324232C%22_token%22%3A9437%3A1lhXdDvRNvbT4MS1J5QpeBmG%3Ac0ccc4aebd1d88175db75c9ce360ad595c55946577bcb9ebc%22%2C%22_auth_user_backend%22%3A%22accounts.backends.CaseInsensitiveModelBackend%22%2C%22last_refreshed%22%3A1436481638.349811%2C%22_platform%22%3A4%7D; ds_user_id=324239437
Connection: keep-alive

HTTP/1.1 302 FOUND
Cache-Control: private, no-cache, no-store, must-revalidate
Content-Language: hu
Content-Type: text/html; charset=utf-8
Date: Thu, 09 Jul 2015 22:46:21 GMT
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: https://pythondev.devtesting.com/instagramredirect.html?code=2c49fd7803384c6c5a89cee
Pragma: no-cache
Set-Cookie: csrftoken=4d9696dac6b0d5b8591b5; expires=Thu, 07-Jul-2016 22:46:21 GMT; Max-Age=31449600; Path=/
Vary: Cookie, Accept-Language
Content-Length: 0
Connection: keep-alive

我真正需要的是Location的URL字符串值看起来像这样:

Location: https://pythondev.devtesting.com/instagramredirect.html?code=2c49fd7803384c6c5a89cee

在搜索可能的解决方案之后,我一直在尝试对以下内容进行修改(此帖子的client_id和重定向已更改):

OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb00962b4601317355f&redirect_uri=https://pythondev.instadev.com/instagramredirect.html&response_type=code"
r = requests.get(OAuthURL, stream=True)
print r.raw.data

但显然,我得到了这些乱码:

ôr˼ÖtÉxlÏß5g·Ì{þµ¼æ®6×MƦ¶Ök:µ#î^Bm,\ûf+ÈÕúµçoO´Úö3ut×]Ta¡*_@[BsÊqgÅëêw×ûQÁç)óf-ÕD[³Û®3×*ï¥Ôï`æ:$nÑÞZ£ô)©ª[}«ØBA"¿²å¿*ÜÞ1BuĹ!DGwËUhµ?:PnmwbâÿK¯ÈIÅ¡#2R¸@¼'ø>"dPtOÈm"W fÞ xöñ­¯vmG cÆÔ>÷οaâykãyk¤=²"ù*A¦=ýz=²3&¤ö©½õ CËIMÛÓ¯6Î(íirG*«

Sockets会为此工作吗?或者是否有另一个我可以使用的模块,这将允许我以与Web浏览器HTTP标头插件相同的方式收集标头?

1 个答案:

答案 0 :(得分:1)

requests会为您返回标题。您可以使用dict - 样式访问来获取它们。

如果您希望request.get返回重定向响应而非自动关注,请指定allow_redirects=False

#UNTESTED
OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb00962b4601317355f&redirect_uri=https://pythondev.instadev.com/instagramredirect.html&response_type=code"
r = requests.get(OAuthURL, stream=True, allow_redirects=False)
print r.headers.keys()
print r.headers['location']

或者,指定allow_redirects=True(默认值),然后检查r.history

#UNTESTED
OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb00962b4601317355f&redirect_uri=https://pythondev.instadev.com/instagramredirect.html&response_type=code"
r = requests.get(OAuthURL, stream=True, allow_redirects=True)
print [resp.headers['location'] for resp in r.history]