从Python中的特定URL刮取推文

时间:2015-12-03 18:59:43

标签: python twitter web-scraping

我一直在测试以下Python代码:

import oauth2 as oauth
import urllib2 as urllib

api_key = "a"
api_secret = "b"
access_token_key = "c"
access_token_secret = "d"

_debug = 0

oauth_token    = oauth.Token(key=access_token_key, secret=access_token_secret)
oauth_consumer = oauth.Consumer(key=api_key, secret=api_secret)

signature_method_hmac_sha1 = oauth.SignatureMethod_HMAC_SHA1()

http_method = "GET"


http_handler  = urllib.HTTPHandler(debuglevel=_debug)
https_handler = urllib.HTTPSHandler(debuglevel=_debug)

'''
Construct, sign, and open a twitter request
using the hard-coded credentials above.
'''
def twitterreq(url, method, parameters):
  req = oauth.Request.from_consumer_and_token(oauth_consumer,
                                             token=oauth_token,
                                             http_method=http_method,
                                             http_url=url, 
                                             parameters=parameters)

  req.sign_request(signature_method_hmac_sha1, oauth_consumer, oauth_token)

  headers = req.to_header()

  if http_method == "POST":
    encoded_post_data = req.to_postdata()
  else:
    encoded_post_data = None
    url = req.to_url()

  opener = urllib.OpenerDirector()
  opener.add_handler(http_handler)
  opener.add_handler(https_handler)

  response = opener.open(url, encoded_post_data)

  return response

def fetchsamples():
  url = "myURL"
  parameters = []
  response = twitterreq(url, "GET", parameters)
  for line in response:
    print line.strip()

if __name__ == '__main__':
  fetchsamples()

据我所知,此代码的目标是从特定网址“myURL”获取推文。但是,当我运行它时,我得到了大量的前端HTML和JavaScript,但没有推文。我误解了这段代码的用途吗?有没有更好的方法去做我想做的事情?

1 个答案:

答案 0 :(得分:0)

在Coursera的课程中进行推特任务时,我面临同样的问题。 最后我需要在twitter上查看doc https://dev.twitter.com/oauth/overview/single-user

response = oauth_req(URL, access_token_key, access_token_secret )

我怀疑SignatureMethod_HMAC_SHA1是不必要的。