鉴于下面的类和数据结构,我想计算与以下结果类似的每个连续3小时滑动窗口的计数总和:
public class Log {
private int id;
private LocalDateTime timestamp;
private int count;
}
id timestamp count
1 2018-10-10T08:00:00 12
2 2018-10-10T08:30:00 5
3 2018-10-10T08:45:00 7
4 2018-10-10T09:10:00 9
5 2018-10-10T09:50:00 3
6 2018-10-10T10:15:00 8
7 2018-10-10T12:00:00 6
8 2018-10-10T12:30:00 1
9 2018-10-10T12:45:00 2
10 2018-10-10T17:30:00 4
11 2018-10-10T17:35:00 7
日志的时间戳按升序排列,并与从第一条记录开始的每个3小时窗口(可以跨越不同的日期)的计数总和相加。 结果将是:
2018-10-10T08:00:00 ~ 2018-10-10T10:59:00 12+5+7+9+3+8
2018-10-10T08:30:00 ~ 2018-10-10T11:29:00 5+7+9+3+8
2018-10-10T08:45:00 ~ 2018-10-10T11:44:00 7+9+3+8
2018-10-10T09:10:00 ~ 2018-10-10T12:09:00 9+3+8+6
2018-10-10T09:50:00 ~ 2018-10-10T12:09:00 3+8+6+1
2018-10-10T10:15:00 ~ 2018-10-10T13:14:00 8+6+1+2
...
我在下面有一些示例代码,但觉得它效率不高(如果有大量日志),因为每次我都必须获取并比较所有日志的过滤时间戳。我如何只能从当前日志进行比较,直到最后?
var logs = List.of();
logs.stream.map(log -> {
var start = log.getTimeStamp();
var end = log.getTimeStamp().plusHours(3);
var logsWithinWindow = logs.stream().filter(l -> isWithinRange(start, end, l.getTimeStamp()));
return logsWithinWindow.map(Log::getCount).sum();
});
答案 0 :(得分:1)
如果您要在任何持续时间内对日志进行计数,则可以使用:
int countLogsInDuration(List<Log> logs, LocalDateTime start, LocalDateTime end) {
return logs.stream()
.filter(log -> isWithinRange(log.getTimeStamp(), start, end))
.mapToInt(Log::getCount)
.sum();
}
依赖
private static boolean isWithinRange(LocalDateTime logTimestamp, LocalDateTime start, LocalDateTime end) {
// return true or false based on comparison
}
此外,至少在您的情况下,每3小时计算一次窗口的日志似乎是多余的,因为滑动窗口大小为30分钟。因此,您可以每30分钟计算一次计数,例如8:00到8:30,然后是8:30到9:00,依此类推。这样可以避免在滑动窗口与之前的持续时间重叠时避免重复计算计数。