对DB的密集查询使加载时间变得疯狂

时间:2015-03-02 11:22:59

标签: sql ruby-on-rails performance algorithm

我正在尝试为我的推文应用程序实现基于内容的推荐程序,我认为我设法做到了。问题是虽然我的解决方案是如此数据库密集型,它使加载时间太长。所以我来这里寻求帮助。在下一节中,我将发布算法,然后我将继续解释。

def candidates2(user)
     @follower_tweet_string = ""  ## storing all the text from all the tweets from all the followers that a user has
     @rest_of_users_strings ## storing all the text from all the tweets a user, that the current user is not following, has.
     scoreHash = Hash.new ## a score hash where the score between the similarities found by the TfIdSimilarity gem are kept
     @rezultat = [] ## the array of users returned 
     @users = User.all ## all the users
     @rest_of_users = [] ## all the users that the current user is not following
     @following = user.following + Array(user) ## all the user the current user is following + the user

     @following.each do |followee|
        @tweets = followee.feed ## feed is a method for requesting all the tweets of that person
         @tweets.each do |tweet|
           @follower_tweet_string = @follower_tweet_string + tweet.content ## getting all the text from all the tweets of all the followers
         end
     end

     @rest_of_users = @users - @following  ## finding out all the users that the user is not following

     document1 = TfIdfSimilarity::Document.new(@follower_tweet_string)
     corpus = [document1]

     @rest_of_users.each do |person|
      @tweets = person.feed ## getting all the tweets of the user 
      @tweets.each do |tweet|
        @follower_tweet_string = @follower_tweet_string + tweet.content ## getting all the text from all the tweets that a user has(a user that isn't followed by the current user)
      end

      ##calculating the score 
      document2 = TfIdfSimilarity::Document.new(@follower_tweet_string)
      corpus = corpus + Array(document2)

      model = TfIdfSimilarity::TfIdfModel.new(corpus)
      matrix = model.similarity_matrix
      scoreHash[person.email] = matrix[model.document_index(document1), model.document_index(document2)]
      corpus = corpus - Array(document2)
      ## stop calculating the score

     end

     sortedHash = Hash[scoreHash.sort_by{|email, score| score}.reverse[0..4]] ## sorting the hash

     @rest_of_users.each do |rank|
      if sortedHash[rank.email] then
        @rezultat = @rezultat + Array(rank) ## getting the resulting users
      end
    end


    @rezultat ## returning the resulting users
  end

可以在第6页上找到算法here,第3.2章,基于内容的推荐者(20行解释等)。

我的算法的主要问题是我必须接受所有未被跟踪的用户,然后接收所有推文,然后应用算法。这是非常密集的DB,它是疯了。我不能这样做......有什么想法可以改善这个吗?

1 个答案:

答案 0 :(得分:1)

您应该将生成建议与显示建议分开。

也就是说,您有一个处理推文的批处理作业,并生成推荐,然后将它们存储在数据库中。这项工作定期开展。

另外,您有一个Web界面,可以在数据库中查询当前建议,然后显示它们。

现在加载时间很快。网络响应时间很快。现在,您的性能问题显示为运行批处理作业的频率。这是一个延迟不是问题的环境,并且可以通过运行并行作业等技术更容易解决。