Scala:优化收集操作

时间:2015-12-02 08:13:20

标签: scala optimization collections

该操作即将计算用户六度关系计数。

每个用户可能有零个或多个朋友,表结构如下:

+----------+---------+------+-----+---------+----------------+
| Field    | Type    | Null | Key | Default | Extra          |
+----------+---------+------+-----+---------+----------------+
| id       | int(11) | NO   | PRI | NULL    | auto_increment |
| userId   | int(11) | NO   | MUL | NULL    |                |
| friendId | int(11) | NO   |     | NULL    |                |
+----------+---------+------+-----+---------+----------------+

现在我从数据库中获取所有关系记录,
并计算为Map[Long, Set[Long]],这是每个用户的id映射到用户的朋友ID集。

val friendMap = friends.groupBy(_.userId) map { group =>
  group._1 -> group._2.map(_.friendId).toSet
}

然后计算每个用户的六度朋友数:

val sixDegreeFriendCountMap = friendMap map { m =>
    val (userId, friendIds) = m
    val twoDegree = friendIds.flatMap(id => friendMap.getOrElse(id, Set())) --
        friendIds
    val threeDegree = twoDegree.flatMap(id => friendMap.getOrElse(id, Set())) --
        friendIds -- twoDegree
    val fourDegree = threeDegree.flatMap(id => friendMap.getOrElse(id, Set())) --
        friendIds -- twoDegree -- threeDegree
    val fiveDegree = fourDegree.flatMap(id => friendMap.getOrElse(id, Set())) --
        friendIds -- twoDegree -- threeDegree -- fourDegree
    val sixDegree = fiveDegree.flatMap(id => friendMap.getOrElse(id, Set())) --
        friendIds -- twoDegree -- threeDegree -- fourDegree -- fiveDegree

    val all = friendIds ++ twoDegree ++ threeDegree ++ fourDegree ++ fiveDegree ++ sixDegree

    userId -> all.size
}

我可以从sixDegreeFriendCountMap获得结果,但问题是计算每个用户花费500毫秒,我有300,000个用户。

所以这个编程运行了40多个小时。

有关sixDegreeFriendCountMap

算法的任何建议

0 个答案:

没有答案