Question

我现在正在使用它的api在新浪微博爬虫上工作。为了使用api，我必须访问oauth2授权页面来从url中检索代码。

这正是我的做法：

使用我的app_key和app_secret（两者都知道）
获取oauth2网页的网址
手动复制并粘贴响应网址中的代码。

这是我的代码：

#call official SDK
client = APIClient(app_key=APP_KEY, app_secret=APP_SECRET, redirect_uri=CALLBACK_URL)

#get url of callback page of authorization
url = client.get_authorize_url()
print url

#open webpage in browser
webbrowser.open_new(url)

#after the webpage responding, parse the code part in the url manually
print 'parse the string after 'code=' in url：'
code = raw_input()

我的问题究竟是如何摆脱手动解析部分？

参考： http://blog.csdn.net/liuxuejiang158blog/article/details/30042493

Answer 1

要使用请求获取页面内容，您可以这样做

import requests

url = "http://example.com"

r = requests.get(url)

print r.text

您可以查看请求库here的详细信息。您可以使用pip将其安装到virtualenv / python dist。

中

对于编写抓取工具，您还可以使用scrapy。

最后，我不明白一件事，如果你有一个官方客户，那你为什么需要解析URL的内容来获取数据。客户端是否使用一些简单易用的功能为您提供数据？

如何在不实际打开python中的网页的情况下解析响应URL？

1 个答案: