应用错误收集

算法：从用户列表中查找朋友

时间：2018-07-25 03:10:20

标签： algorithm

场景：在我的应用中，用户可以关注一条信息。只要他们的朋友喜欢该帖子，他们就会收到通知。当成千上万的用户关注并喜欢发布信息时，这个问题就变得微不足道了。

我当前的方法很简单，当一个新用户喜欢一个帖子时，遍历所有关注该帖子的用户，并检查新用户是否在他们的朋友列表中（假设平均大小为{{1} }。我为好友列表建立了索引，因此查找为N，这意味着对于每个新喜欢的对象，如果有O(logN)个人关注该帖子，并且直接有k个，则计算为O(klogN)喜欢它的用户，则总体复杂度为k。我可以做得更好吗？

注意：

通知不必立即，也不必100％发生
帖子是由用户创建的
如果使用的话，我正在使用NoSQL数据库Firestore

2 个答案:

答案 0 :(得分：0)

What you need to use is a hybrid approach. Take advantage of the fact that the users friend list might be shorter than the number of followers, or vice versa. There are two options:

Do what you do now, and check every follower against the new user's friends list. The time complexity reflects the number of followers.
Do the reverse, and check every friend of the user against the followers list for the post. The time complexity reflects the number of friends of the user.

Armed with these tactics, now we design an algorithm to check which of the two will give the better performance.

Keep an active count of the number of friends of each user, and of the followers of a post. When someone likes a post, if they have fewer friends than there are people that liked the post, its faster to check if each friend is in the followers list (use a self-balancing BST or hash table in the implementation). If there are fewer followers than the user has friends, the reverse would be faster.

If there are N followers, K users liking the post, and F friends, the checking friend->follower would give a run time of O(N*F*log(K)), and follower->friend would be O(N*K*log(F)). The worst case still remains the same, however if you were only concerned about theoretical time bounds then you could substitute your index table with a hash table anyways which is O(1) instead of O(log(n)) anyways.

答案 1 :(得分：0)

我认为可以通过使用更多的内存空间将其改进为N^2 + k logN^2。从根本上讲，此问题是找到两个集合（新喜欢的用户的朋友的集合和关注者的集合，或关注者的朋友的集合和喜欢的用户的集合）的交集的问题。由于查找便宜，我们希望使查找的集合尽可能大。因此，如果我们将所有关注者的所有朋友放入一个大小为N^2的大集合（更具体地说是一张地图）中，那么如果有k个喜欢的用户加上{{1 }}

将朋友聚集在一起的另一个好处是许多用户拥有共同的朋友，因此实际大小可能小于k logN^2