Score entity based on its likes count and creation time

时间:2018-09-18 20:35:42

标签: java database spring hibernate calculated-columns

When reading from database, I want to sort my Post entities based on two factors:

  • likes count (the more the better)
  • age (the newer the better)

Currently I have implemented it this way (as a calculated value):

@Entity
public class Post {

    // divide timestamp by a day length so after each day score decrements by 1
    @Formula("UNIX_TIMESTAMP(creation_date_time) / 24 * 60 * 60 * 1000 + likes_count")
    private long score;

    @CreationTimestamp
    private LocalDateTime creationDateTime;

    @Min(0)
    private long likesCount;
}

It works fine but may not be the best approach because:

  1. I think the RDBMS cannot make any index for score attribute.
  2. The hard-coded function UNIX_TIMESTAMP() is specific to MySQL. So this will cause problems if I want to use another database (say H2) in my test environment.

2 个答案:

答案 0 :(得分:1)

我认为这可能是一个有趣的解决方案,可以帮助您保持分数更新。将创建一个调度程序,该调度程序将在每个特定的时间段(例如,在每天的凌晨1点完成),检查所有帖子以更新其得分,从而保持更新的得分。

@Component
public class Scheduler {

    @Autowired
    PostService postService;

    // Dialy routine that will start at 1:00 am.
    @Scheduled(cron="0 0 1 * * *")
    public void updateDateScore() {

        // Of course, I wouldn't recommend doing this all at once.
        // I would do it in batches, but this is just to give you an idea.
        List<Post> posts = postService.getAll();
        for(Post p: posts) {
           LocalDateTime time = p.getCreationTime();
           ZoneId zoneId = ZoneId.systemDefault(); 
           long epoch = time.atZone(zoneId).toEpochSecond();
           // Your formula.
           long score = epoch / 24 * 60 * 60 * 1000 + p.getLikesCount();
           p.setScore(score);
           postService.update(p);
        }

    }

}

为了使计划的任务正常工作,必须在主类@EnableScheduling中添加以下注释。当然,这将在所有RDBMS上都有效,因此您不必担心使用的是哪个数据库,并且随时都有更新的索引。

建议

  • 这应该分批完成,这样它的性能会好很多。
  • 我当然会对我的getPost()方法进行分页,这样我只会获取合理的数量来更新每个循环。
  • 此外,我将设置获取帖子的最大日期。无论如何,经过一定时间后,帖子可能没有那么重要。

答案 1 :(得分:1)

使用数据库触发器来更新/维护那些侧汇总表。为此类事情运行繁重的计划作业(这会导致负载峰值)确实没有意义...

此外,在WHERE子句下面将永远不会使用索引。永远不会。

UNIX_TIMESTAMP(creation_date_time) / 24 * 60 * 60 * 1000 + likes_count