Java:两个或更多时间序列相加

时间:2018-03-06 12:12:21

标签: java time-series java-stream

我有多个时间序列:

       x
|    date    | value |
| 2017-01-01 |   1   |
| 2017-01-05 |   4   |
|     ...    |  ...  |

       y
|    date    | value |
| 2017-01-03 |   3   |
| 2017-01-04 |   2   |
|     ...    |  ...  |

令人沮丧的是,在我的数据集中,两个系列中并不总是匹配日期。对于缺少一个的情况,我想使用最后一个可用日期(如果没有,则为0)。 例如2017-01-03我会使用y=3x=1(从前一天开始)获取output = 3 + 1 = 4

我的每个时间序列都有:

class Timeseries {
    List<Event> x = ...;
}

class Event {
    LocalDate date;
    Double value;
}

已将其读入List<Timeseries> allSeries

我以为我可以用流来加总它们

List<TimeSeries> allSeries = ...
Map<LocalDate, Double> byDate = allSeries.stream()
    .flatMap(s -> s.getEvents().stream())
.collect(Collectors.groupingBy(Event::getDate,Collectors.summingDouble(Event::getValue)));

但是这不会让我错过上面提到的日期逻辑。

我怎么能做到这一点? (它不一定是溪流)

3 个答案:

答案 0 :(得分:3)

我说你需要为适当的查询功能扩展Timeseries类。

class Timeseries {
    private SortedMap<LocalDate, Integer> eventValues = new TreeMap<>();
    private List<Event> eventList;

    public Timeseries(List<Event> events) {
        events.forEach(e -> eventValue.put(e.getDate(), e.getValue());
        eventList=new ArrayList(events);
    }
    public List<Event> getEvents() {
        return Collections.unmodifiableList(eventList);
    }

    public Integer getValueByDate(LocalDate date) {
        Integer value = eventValues.get(date);
        if (value == null) {
            // get values before the requested date
            SortedMap<LocalDate, Integer> head = eventValues.headMap(date);
            value = head.isEmpty()
                ? 0   // none before
                : head.get(head.lastKey());  // first before
        }
        return value;
    }
}

然后合并

Map<LocalDate, Integer> values = new TreeMap<>();
List<LocalDate> allDates = allSeries.stream().flatMap(s -> s.getEvents().getDate())
    .distinct().collect(toList());

for (LocalDate date : allDates) {
    for (Timeseries series : allSeries) {
        values.merge(date, series.getValueByDate(date), Integer::ad);
    }
}

编辑:实际上,NavigableMap接口在这种情况下更有用,它会使丢失的数据大小写

Integer value = eventValues.get(date);
if (value == null) {
    Entry<LocalDate, Integer> ceiling = eventValues.ceilingKey(date);
    value = ceiling != null ? eventValues.get(ceiling) : 0;
}

答案 1 :(得分:1)

一种方法是使事件按日期进行比较,并使用TreeSet floor方法:

class Event implements Comparable<Event> {
        // ... 
        @Override
        public int compareTo(Event o) {
            return date.compareTo(o.date);
        }
}

然后在Timeseries类而不是List中使用TreeSet<Event> x并使用空条目填充它以使floor返回它,如果没有以前的值:

class Timeseries {
        public static final Event ZERO = new Event(LocalDate.of(1, 1, 1), 0d);
        TreeSet<Event> x = new TreeSet<>(Arrays.asList(ZERO));

        // ...
}

现在收集所有已知事件并计算总和:

 TreeSet<Event> events = allSeries.stream()
                .flatMap(s -> s.getEvents().stream()).collect(Collectors.toCollection(TreeSet::new));


 Map<LocalDate, Double> sumsByDate = events.stream().
                map(event -> new AbstractMap.SimpleEntry<>(event.getDate(),
                                                           allSeries.stream().mapToDouble(a -> a.getEvents().floor(event).getValue())
                                                                   .sum())).
                filter(p -> !p.getKey().equals(Timeseries.ZERO.getDate())).
                collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

答案 2 :(得分:0)

所以我设法用流做部分。虽然您在getRelevantValueFor方法中进行了大量重复排序,但它看起来并不特别有效。我希望有一个更有效的解决方案。

public Timeseries combine(List<Timeseries> allSeries) {

    // Get a unique set of all the dates accross all time series
    Set<LocalDate> allDates = allSeries.stream().flatMap(t -> t.get().stream()).map(Event::getDate).collect(Collectors.toSet());

    Timeseries output = new Timeseries();

    // For each date sum up the latest event in each timeseries
    allDates.forEach(date -> {
        double total = 0;
        for(Timeseries series : allSeries) {
            total += getRelevantValueFor(series, date).orElse(0.0);
        }
        output.add(new Event(date, total));
    });
    return output;
}

private Optional<Double> getRelevantValueFor(Timeseries series, LocalDate date) {
    return series.getEvents().stream().filter(event -> !event.getDate().isAfter(date)).max(ascendingOrder()).map(Event::getValue);
}

private Comparator<Event> ascendingOrder() {
    return (event1, event2) -> {
        long diff = event1.getDate().toEpochMilli() - event2.getDate().toEpochMilli();
        if(diff>0) return 1;
        if(diff<0) return -1;
        return 0;
    };
}