我想检查一下twitter用户ID列表的状态

时间:2017-09-22 22:40:37

标签: python twitter twitter-oauth

如问题中所述,我想检查一下twitter用户ID列表的状态。我有大约20k的推特用户。我能够得到大约一半的时间表。其他可能已暂停,停用或有0条推文。我在网上找到了这个脚本,据说可以检查twitter用户的状态。这是脚本(https://github.com/dbrgn/Twitter-User-Checker/blob/master/checkuser.py): `

#!/usr/bin/env python2

# Twitter User Checker
# Author: Danilo Bargen
# License: GPLv3

import sys
import tweepy
import urllib2
try:
    import json
except ImportError:
    import simplejson as json
from datetime import datetime

auth = tweepy.AppAuthHandler("xxx", "xxxx")

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

if (not api):
    print ("Can't Authenticate")
    sys.exit(-1)

# Continue with rest of code


try:
    user = sys.argv[1]
except IndexError:
    print 'Usage: checkuser.py [username]'
    sys.exit(-1)

url = 'https://api.twitter.com/1.1/users/show.json?id=%s' % user

try:
    request = urllib2.urlopen(url)
    status = request.code
    data = request.read()
except urllib2.HTTPError, e:
    status = e.code
    data = e.read()

data = json.loads(data)

print data

if status == 403:
     print "helloooooooo"
#    if 'suspended' in data['error']:
#        print 'User %s has been suspended' % user
#    else:
#        print 'Unknown response'
elif status == 404:
    print 'User %s not found' % user
elif status == 200:
    days_active = (datetime.now() - datetime.strptime(data['created_at'],
                   '%a %b %d %H:%M:%S +0000 %Y')).days
    print 'User %s is active and has posted %s tweets in %s days' % \
             (user, data['statuses_count'], days_active)
else:
    print 'Unknown response'

`

我收到以下错误: File "twitter_status_checker.py", line 16, in <module> auth = tweepy.AppAuthHandler("xxx", "xxxx") File "/Users/aloush/anaconda/lib/python2.7/site-packages/tweepy/auth.py", line 170, in __init__ 'but got %s instead' % data.get('token_type')) tweepy.error.TweepError: Expected token_type to equal "bearer", but got None instead

任何人都可以帮我修复错误,并允许脚本检查用户列表而不是每次运行一个用户。

以下是我要检查的HTTP状态代码列表:https://dev.twitter.com/overview/api/response-codes

谢谢。

1 个答案:

答案 0 :(得分:1)

您似乎无法验证Twitter。对于最新版本(3.5),tweepy使用OAuthHander进行身份验证。请检查如何使用Tweepy。而且您使用的链接脚本是逐个检查帐户,这可能会非常慢。

要通过Tweepy检查大量Twitter帐户的状态,特别是如果您想知道它不活动的原因(例如,未找到,暂停),您需要了解以下内容:

  1. 应该使用哪种API?
  2. Twitter提供了两个相关的API,一个是user/show,另一个是user/lookup。前一个返回一个指定用户的配置文件,而后一个返回最多100个用户的块的配置文件。相应的tweepy API是API.get_userAPI.lookup_users(我在文档中找不到它,但它确实存在于代码中)。当然,你应该使用第二个。但是,当存在一些非活动用户时,lookup_users API仅返回这些用户处于活动状态。这意味着您必须致电get_user API以获取非活动帐户的详细原因。

    1. 如何确定用户的状态?
    2. 当然,你应该注意Twitter提供的response code。但是,当使用tweepy而不是HTTP ERROR CODES时,您应该更多地关注ERROR MESSAGE。以下是一些常见情况:

      • 如果成功获取配置文件,则它是活动用户;
      • 否则,我们可以查看错误代码:
        • 50未找到用户。
        • 63用户已被暂停。
        • ...可能有更多关于用户帐户的代码

      对于tweepy,当配置文件无法获取时,会引发TweepyError。 TweepyError.message [0]是来自twitter API的错误消息。

      好的,这是处理的逻辑

      (1)将大块用户划分为100个大小的片段;

      (2)对于这些部分中的每一部分,做(3)和(4);

      (3)调用lookup_users,返回的用户将被视为活跃用户,其余用户将被视为非活动用户;

      (4)为每个非活动用户致电get_user以获取详细原因。

      以下是您的示例代码:

      import logging
      
      import tweepy
      
      logger = logging.getLogger(__name__)
      
      
      def to_bulk(a, size=100):
          """Transform a list into list of list. Each element of the new list is a
          list with size=100 (except the last one).
          """
          r = []
          qt, rm = divmod(len(a), size)
          i = -1
          for i in range(qt):
              r.append(a[i * size:(i + 1) * size])
          if rm != 0:
              r.append(a[(i + 1) * size:])
          return r
      
      
      def fast_check(api, uids):
          """ Fast check the status of specified accounts.
          Parameters
          ---------------
              api: tweepy API instance
              uids: account ids
      
          Returns
          ----------
          Tuple (active_uids, inactive_uids).
              `active_uids` is a list of active users and
              `inactive_uids` is a list of inactive uids,
                  either supended or deactivated.
          """
          try:
              users = api.lookup_users(user_ids=uids,
                                       include_entities=False)
              active_uids = [u.id for u in users]
              inactive_uids = list(set(uids) - set(active_uids))
              return active_uids, inactive_uids
          except tweepy.TweepError as e:
              if e[0]['code'] == 50 or e[0]['code'] == 63:
                  logger.error('None of the users is valid: %s', e)
                  return [], inactive_uids
              else:
                  # Unexpected error
                  raise
      
      
      def check_inactive(api, uids):
          """ Check inactive account, one by one.
          Parameters
          ---------------
          uids : list
              A list of inactive account
      
          Returns
          ----------
              Yield tuple (uid, reason). Where `uid` is the account id,
              and `reason` is a string.
          """
          for uid in uids:
              try:
                  u = api.get_user(user_id=uid)
                  logger.warning('This user %r should be inactive', uid)
                  yield (u, dict(code=-1, message='OK'))
              except tweepy.TweepyError as e:
                  yield (uid, e[0][0])
      
      
      def check_one_block(api, uids):
          """Check the status of user for one block (<100). """
          active_uids, inactive_uids = fast_check(api, uids)
          inactive_users_status = list(check_inactive(api, inactive_uids))
          return active_uids, inactive_users_status
      
      
      def check_status(api, large_uids):
          """Check the status of users for any size of users. """
          active_uids = []
          inactive_users_status = []
          for uids in to_bulk(large_uids, size=100):
              au, iu = check_one_block(api, uids)
              active_uids += au
              inactive_users_status += iu
          return active_uids, inactive_users_status
      
      
      def main(twitter_crendient, large_uids):
          """ The main function to call check_status. """
          # First prepare tweepy API
          auth = tweepy.OAuthHandler(twitter_crendient['consumer_key'],
                                     twitter_crendient['consumer_secret'])
          auth.set_access_token(twitter_crendient['access_token'],
                                twitter_crendient['access_token_secret'])
          api = tweepy.API(auth, wait_on_rate_limit=True)
          # Then, call check_status
          active_uids, inactive_user_status = check_status(api, large_uids)
      

      由于缺乏数据,我从不测试代码。可能有错误,你应该照顾它们。

      希望这有用。