使用Tweepy和Python从Twitter中提取1000个URI

时间:2017-02-22 20:46:19

标签: python tweepy

我正在尝试使用Tweepy和Python从Twitter中提取1000个唯一的,完全扩展的URI。具体来说,我感兴趣的链接指向我在Twitter之外(所以不要回到其他推文/转发/重复)。

我写的代码一直给我一个“实体”的关键错误。

在破坏之前会给我一些网址;有些是延长的,有些则不是。我不知道如何解决这个问题。

请帮帮我!

注意:我离开了我的凭据。

这是我的代码:

    # Import the necessary methods from different libraries
      import tweepy
      from tweepy.streaming import StreamListener
      from tweepy import OAuthHandler
      from tweepy import Stream
      import json

    # Variables that contains the user credentials to access Twitter API
      access_token = "enter token here"
      access_token_secret = "enter token here"
      consumer_key = "enter key here"
      consumer_secret = "enter key here"

    # Accessing tweepy API
    # api = tweepy.API(auth)

    # This is a basic listener that just prints received tweets to stdout.
    class StdOutListener(StreamListener):
         def on_data(self, data):
         # resource: http://code.runnable.com/Us9rrMiTWf9bAAW3/how-to-              stream-data-from-twitter-with-tweepy-for-python
    # Twitter returns data in JSON format - we need to decode it first
    decoded = json.loads(data)

    # resource: http://socialmedia-class.org/twittertutorial.html
    # Print each tweet in the stream to the screen
    # Here we set it to stop after getting 1000 tweets.
    # You don't have to set it to stop, but can continue running
    # the Twitter API to collect data for days or even longer.
    count = 1000

    for url in decoded["entities"]["urls"]:
        count -= 1
        print "%s" % url["expanded_url"] + "\r\n\n"
        if count <= 0:
            break

def on_error(self, status):
    print status


if __name__ == '__main__':
     # This handles Twitter authetification and the connection to Twitter     Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)

# This line filter Twitter Streams to capture data by the keyword: YouTube
stream.filter(track=['YouTube'])

1 个答案:

答案 0 :(得分:0)

似乎API正在达到速率限制,因此一个选项是在获得KeyError时包含异常,然后我会看到[u'limit']。我添加了一个计数显示来验证它是否到达1000

count = 1000 # moved outside of class definition to avoid getting reset

class StdOutListener(StreamListener):
    def on_data(self, data):

        decoded = json.loads(data)

        global count # get the count
        if count <= 0:
            import sys
            sys.exit()
        else:
            try:
                for url in decoded["entities"]["urls"]:
                    count -= 1
                    print count,':', "%s" % url["expanded_url"] + "\r\n\n"

            except KeyError:
                print decoded.keys()

    def on_error(self, status):
        print status


if __name__ == '__main__':

    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    stream.filter(track=['YouTube'])