如问题中所述,我想检查一下twitter用户ID列表的状态。我有大约20k的推特用户。我能够得到大约一半的时间表。其他可能已暂停,停用或有0条推文。我在网上找到了这个脚本,据说可以检查twitter用户的状态。这是脚本(https://github.com/dbrgn/Twitter-User-Checker/blob/master/checkuser.py): `
#!/usr/bin/env python2
# Twitter User Checker
# Author: Danilo Bargen
# License: GPLv3
import sys
import tweepy
import urllib2
try:
import json
except ImportError:
import simplejson as json
from datetime import datetime
auth = tweepy.AppAuthHandler("xxx", "xxxx")
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
if (not api):
print ("Can't Authenticate")
sys.exit(-1)
# Continue with rest of code
try:
user = sys.argv[1]
except IndexError:
print 'Usage: checkuser.py [username]'
sys.exit(-1)
url = 'https://api.twitter.com/1.1/users/show.json?id=%s' % user
try:
request = urllib2.urlopen(url)
status = request.code
data = request.read()
except urllib2.HTTPError, e:
status = e.code
data = e.read()
data = json.loads(data)
print data
if status == 403:
print "helloooooooo"
# if 'suspended' in data['error']:
# print 'User %s has been suspended' % user
# else:
# print 'Unknown response'
elif status == 404:
print 'User %s not found' % user
elif status == 200:
days_active = (datetime.now() - datetime.strptime(data['created_at'],
'%a %b %d %H:%M:%S +0000 %Y')).days
print 'User %s is active and has posted %s tweets in %s days' % \
(user, data['statuses_count'], days_active)
else:
print 'Unknown response'
`
我收到以下错误:
File "twitter_status_checker.py", line 16, in <module>
auth = tweepy.AppAuthHandler("xxx", "xxxx")
File "/Users/aloush/anaconda/lib/python2.7/site-packages/tweepy/auth.py", line 170, in __init__
'but got %s instead' % data.get('token_type'))
tweepy.error.TweepError: Expected token_type to equal "bearer", but got None instead
任何人都可以帮我修复错误,并允许脚本检查用户列表而不是每次运行一个用户。
以下是我要检查的HTTP状态代码列表:https://dev.twitter.com/overview/api/response-codes
谢谢。
答案 0 :(得分:1)
您似乎无法验证Twitter。对于最新版本(3.5),tweepy使用OAuthHander进行身份验证。请检查如何使用Tweepy。而且您使用的链接脚本是逐个检查帐户,这可能会非常慢。
要通过Tweepy检查大量Twitter帐户的状态,特别是如果您想知道它不活动的原因(例如,未找到,暂停),您需要了解以下内容:
Twitter提供了两个相关的API,一个是user/show,另一个是user/lookup。前一个返回一个指定用户的配置文件,而后一个返回最多100个用户的块的配置文件。相应的tweepy API是API.get_user
和API.lookup_users
(我在文档中找不到它,但它确实存在于代码中)。当然,你应该使用第二个。但是,当存在一些非活动用户时,lookup_users
API仅返回这些用户处于活动状态。这意味着您必须致电get_user
API以获取非活动帐户的详细原因。
当然,你应该注意Twitter提供的response code。但是,当使用tweepy而不是HTTP ERROR CODES时,您应该更多地关注ERROR MESSAGE。以下是一些常见情况:
对于tweepy,当配置文件无法获取时,会引发TweepyError。 TweepyError.message [0]是来自twitter API的错误消息。
好的,这是处理的逻辑
(1)将大块用户划分为100个大小的片段;
(2)对于这些部分中的每一部分,做(3)和(4);
(3)调用lookup_users
,返回的用户将被视为活跃用户,其余用户将被视为非活动用户;
(4)为每个非活动用户致电get_user
以获取详细原因。
以下是您的示例代码:
import logging
import tweepy
logger = logging.getLogger(__name__)
def to_bulk(a, size=100):
"""Transform a list into list of list. Each element of the new list is a
list with size=100 (except the last one).
"""
r = []
qt, rm = divmod(len(a), size)
i = -1
for i in range(qt):
r.append(a[i * size:(i + 1) * size])
if rm != 0:
r.append(a[(i + 1) * size:])
return r
def fast_check(api, uids):
""" Fast check the status of specified accounts.
Parameters
---------------
api: tweepy API instance
uids: account ids
Returns
----------
Tuple (active_uids, inactive_uids).
`active_uids` is a list of active users and
`inactive_uids` is a list of inactive uids,
either supended or deactivated.
"""
try:
users = api.lookup_users(user_ids=uids,
include_entities=False)
active_uids = [u.id for u in users]
inactive_uids = list(set(uids) - set(active_uids))
return active_uids, inactive_uids
except tweepy.TweepError as e:
if e[0]['code'] == 50 or e[0]['code'] == 63:
logger.error('None of the users is valid: %s', e)
return [], inactive_uids
else:
# Unexpected error
raise
def check_inactive(api, uids):
""" Check inactive account, one by one.
Parameters
---------------
uids : list
A list of inactive account
Returns
----------
Yield tuple (uid, reason). Where `uid` is the account id,
and `reason` is a string.
"""
for uid in uids:
try:
u = api.get_user(user_id=uid)
logger.warning('This user %r should be inactive', uid)
yield (u, dict(code=-1, message='OK'))
except tweepy.TweepyError as e:
yield (uid, e[0][0])
def check_one_block(api, uids):
"""Check the status of user for one block (<100). """
active_uids, inactive_uids = fast_check(api, uids)
inactive_users_status = list(check_inactive(api, inactive_uids))
return active_uids, inactive_users_status
def check_status(api, large_uids):
"""Check the status of users for any size of users. """
active_uids = []
inactive_users_status = []
for uids in to_bulk(large_uids, size=100):
au, iu = check_one_block(api, uids)
active_uids += au
inactive_users_status += iu
return active_uids, inactive_users_status
def main(twitter_crendient, large_uids):
""" The main function to call check_status. """
# First prepare tweepy API
auth = tweepy.OAuthHandler(twitter_crendient['consumer_key'],
twitter_crendient['consumer_secret'])
auth.set_access_token(twitter_crendient['access_token'],
twitter_crendient['access_token_secret'])
api = tweepy.API(auth, wait_on_rate_limit=True)
# Then, call check_status
active_uids, inactive_user_status = check_status(api, large_uids)
由于缺乏数据,我从不测试代码。可能有错误,你应该照顾它们。
希望这有用。