Question

我正在使用Google App Engine（python）作为移动社交游戏的后端。该游戏使用Twitter集成，允许人们关注相对排行榜并与朋友或粉丝对战。

到目前为止，最困难的部分是背景（推送）任务，该任务命中Twitter API以查询给定用户的朋友和关注者，然后将该数据存储在我们的数据存储区中。我正在努力优化它以尽可能地降低成本。

数据模型：

与应用程序的这一部分有三个主要模型：

User
'''General user info, like scores and stats'''
# key id => randomly generated string that uniquely identifies a user
#   along the lines of user_kdsgj326
#   (I realize I probably should have just used the integer ID that GAE
#   creates, but its too late for that)

AuthAccount
'''Authentication mechanism.
     A user may have multiple auth accounts- one for each provider'''
# key id => concatenation of the auth provider and the auth provider's unique
#   ID for that user, ie, "tw:555555", where '555555' is their twitter ID
auth_id = ndb.StringProperty(indexed=True) # ie, '555555'
user = ndb.KeyProperty(kind=User, indexed=True)
extra_data = ndb.JsonProperty(indexed=False) # twitter picture url, name, etc.

RelativeUserScore
'''Denormalization for quickly generated relative leaderboards'''
# key id => same as their User id, ie, user_kdsgj326, so that we can quickly
#     retrieve the object for each user
follower_ids = ndb.StringProperty(indexed=True, repeated=True)
# misc properties for the user's score, name, etc. needed for leaderboard

我不认为这个问题是必要的，但为了以防万一，here是一个更详细的讨论，导致了这个设计。

任务

后台线程接收推特认证数据，并通过tweepy从Twitter API请求一大堆好友ID。默认情况下，Twitter最多可以发送5000个好友ID，如果可以避免，我宁可不要随意限制更多（你每分钟只能向他们的API发出这么多请求）。

获得朋友ID列表后，我可以轻松将其转换为“tw：”AuthAccount密钥ID，并使用get_multi检索AuthAccounts。然后我删除不在我们系统中的Twitter用户的所有Null帐户，并获取我们系统中的twitter朋友的所有用户ID。这些ID也是RelativeUserScores的关键，因此我使用一堆transactional_tasklets将此用户的ID添加到RelativeUserScore的关注者列表中。

优化问题

首先发生的事情是调用Twitter的API。鉴于这是任务中其他任务所必需的，我假设我不会在使这个异步，正确吗？（GAE已经足够聪明，可以使用服务器处理其他任务，而这个任务阻止了吗？）
在确定推特朋友是否正在玩我们的游戏时，我目前将所有推特朋友ID转换为身份验证帐户ID，并通过get_multi检索。鉴于这些数据稀少（大多数Twitter朋友很可能不会玩我们的游戏），我会更好地使用直接检索用户ID的投影查询吗？有点像...
```
twitter_friend_ids = twitter_api.friend_ids() # potentially 5000 values
friend_system_ids = AuthAccount\
    .query(AuthAccount.auth_id.IN(twitter_friend_ids))\
    .fetch(projection=[AuthAccount.user_id])
```
（我不记得或找不到，但我读到这个更好，因为你不浪费时间试图读取不存在的模型对象
无论我最终使用get_multi还是投影查询，将请求分解为多个异步查询是否有任何好处，而不是一次尝试获取/查询可能的5000个对象？

Answer 1

我会像这样组织任务：

对Twitter Feed进行异步提取调用
Use memcache保留所有AuthAccount-＆gt;用户数据：
- 从memcache请求数据，如果它不存在则调用fetch_async()来填充memcache和本地字典
通过dict

以下是一些示例代码：

AuthAccount

这针对相对较少数量的用户和高请求率进行了优化。您上面的评论表明用户数量较多且请求率较低，因此我只对您的代码进行此更改：

future = twitter_api.friend_ids()    # make this asynchronous

auth_users = memcache.get('auth_users')
if auth_users is None:
    auth_accounts = AuthAccount.query()
                               .fetch(projection=[AuthAccount.auth_id,
                                                  AuthAccount.user_id])
    auth_users = dict([(a.auth_id, a.user_id) for a in auth_accounts])
    memcache.add('auth_users', auth_users, 60)

twitter_friend_ids = future.get_result()  # get async twitter results

friend_system_ids = []
for id in twitter_friend_ids:
    friend_id = auth_users.get("tw:%s" % id)
    if friend_id:
        friend_system_ids.append(friend_id)

当使用带有密钥的twitter_friend_ids = twitter_api.friend_ids() # potentially 5000 values auth_account_keys = [ndb.Key("AuthAccount", "tw:%s" % id) for id in twitter_friend_ids] friend_system_ids = filter(None, ndb.get_multi(auth_account_keys))时，这将使用ndb的内置内存缓存来保存数据。

优化社交排行榜

1 个答案: