Tweepy:获取Twitter账户样本的所有朋友:如何处理受保护的用户

时间:2017-03-30 08:57:37

标签: python twitter tweepy

我想查找一个Twitter帐户的朋友样本中的所有朋友(意味着Twitter用户正在关注),以查看他们共有的其他朋友。问题是我不知道如何处理受保护的帐户,我一直遇到这个错误:

tweepy.error.TweepError: Not authorized.

这是我的代码:

...
screen_name = ----
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as file:
ids = file.readlines()

num_samples = 30
ids = [x.strip() for x in ids]
friends = [[] for i in range(num_samples)]

for i in range(0, num_samples):
    id = random.choice(ids)
    for friend in tweepy.Cursor(api.friends_ids, id).items():
        print(friend)
        friends[i].append(friend)

我列出了来自一个帐户screen_name的所有朋友,我从中加载了朋友ID。然后我想抽样一些并查看他们的朋友。

我也尝试过这样的事情:

def limit_handled(cursor, name):
    try:
        yield cursor.next()
    except tweepy.TweepError:
        print("Something went wrong... ", name)
        pass

for i in range(0, num_samples):
    id = random.choice(ids)
    items = tweepy.Cursor(api.friends_ids, id).items()
    for friend in limit_handled(items, id):
        print(friend)
        friends[i].append(friend)

但是,在进入下一个样本之前,似乎每个样本朋友只有一个朋友被存储。我对Python和Tweepy都很陌生,所以如果有什么看起来很奇怪,请告诉我。

1 个答案:

答案 0 :(得分:0)

首先,关于命名的几点评论。名称fileid受到保护,因此您应该避免使用它们来命名变量 - 我已对其进行了更改。

其次,当您初始化tweepy API时,如果您使用wait_on_rate_limit=True,它就足以处理速率限制,如果您使用wait_on_rate_limit_notify=True,则会因速率限制而延迟通知您。

当您设置friends = [[] for i in range(num_samples)]时,您也会丢失一些信息,因为您无法将找到的朋友与他们相关的帐户相关联。您可以使用字典,它将使用的每个ID与找到的朋友相关联,以便更好地处理。

我的更正代码如下:

import tweepy
import random

consumer_key = '...'
consumer_secret = '...'
access_token = '...'
access_token_secret = '...'

# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Creation of the actual interface, using authentication. Use rate limits.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

screen_name = '----'
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as f:
    ids = [x.strip() for x in f.readlines()]

num_samples = 30
friends = dict()

# Initialise i
i = 0

# We want to check that i is less than our number of samples, but we also need to make
# sure there are IDs left to choose from.
while i <= num_samples and ids:
    current_id = random.choice(ids)

    # remove the ID we're testing from the list, so we don't pick it again.
    ids.remove(current_id)

    try:
        # try to get friends, and add them to our dictionary value if we can
        # use .get() to cope with the first loop.
        for page in tweepy.Cursor(api.friends_ids, current_id).pages():
            friends[current_id] = friends.get(current_id, []) + page
        i += 1
    except tweepy.TweepError:
        # we get a tweep error when we can't view a user - skip them and move onto the next.
        # don't increment i as we want to replace this user with someone else.
        print 'Could not view user {}, skipping...'.format(current_id)

输出是一个字典friends,其中包含用户ID和每个用户的朋友项目。