我有多个时间序列:
x
| date | value |
| 2017-01-01 | 1 |
| 2017-01-05 | 4 |
| ... | ... |
y
| date | value |
| 2017-01-03 | 3 |
| 2017-01-04 | 2 |
| ... | ... |
令人沮丧的是,在我的数据集中,两个系列中并不总是匹配日期。对于缺少一个的情况,我想使用最后一个可用日期(如果没有,则为0)。
例如2017-01-03
我会使用y=3
和x=1
(从前一天开始)获取output = 3 + 1 = 4
我的每个时间序列都有:
class Timeseries {
List<Event> x = ...;
}
class Event {
LocalDate date;
Double value;
}
已将其读入List<Timeseries> allSeries
我以为我可以用流来加总它们
List<TimeSeries> allSeries = ...
Map<LocalDate, Double> byDate = allSeries.stream()
.flatMap(s -> s.getEvents().stream())
.collect(Collectors.groupingBy(Event::getDate,Collectors.summingDouble(Event::getValue)));
但是这不会让我错过上面提到的日期逻辑。
我怎么能做到这一点? (它不一定是溪流)
答案 0 :(得分:3)
我说你需要为适当的查询功能扩展Timeseries类。
class Timeseries {
private SortedMap<LocalDate, Integer> eventValues = new TreeMap<>();
private List<Event> eventList;
public Timeseries(List<Event> events) {
events.forEach(e -> eventValue.put(e.getDate(), e.getValue());
eventList=new ArrayList(events);
}
public List<Event> getEvents() {
return Collections.unmodifiableList(eventList);
}
public Integer getValueByDate(LocalDate date) {
Integer value = eventValues.get(date);
if (value == null) {
// get values before the requested date
SortedMap<LocalDate, Integer> head = eventValues.headMap(date);
value = head.isEmpty()
? 0 // none before
: head.get(head.lastKey()); // first before
}
return value;
}
}
然后合并
Map<LocalDate, Integer> values = new TreeMap<>();
List<LocalDate> allDates = allSeries.stream().flatMap(s -> s.getEvents().getDate())
.distinct().collect(toList());
for (LocalDate date : allDates) {
for (Timeseries series : allSeries) {
values.merge(date, series.getValueByDate(date), Integer::ad);
}
}
编辑:实际上,NavigableMap
接口在这种情况下更有用,它会使丢失的数据大小写
Integer value = eventValues.get(date);
if (value == null) {
Entry<LocalDate, Integer> ceiling = eventValues.ceilingKey(date);
value = ceiling != null ? eventValues.get(ceiling) : 0;
}
答案 1 :(得分:1)
一种方法是使事件按日期进行比较,并使用TreeSet floor
方法:
class Event implements Comparable<Event> {
// ...
@Override
public int compareTo(Event o) {
return date.compareTo(o.date);
}
}
然后在Timeseries类而不是List中使用TreeSet<Event> x
并使用空条目填充它以使floor
返回它,如果没有以前的值:
class Timeseries {
public static final Event ZERO = new Event(LocalDate.of(1, 1, 1), 0d);
TreeSet<Event> x = new TreeSet<>(Arrays.asList(ZERO));
// ...
}
现在收集所有已知事件并计算总和:
TreeSet<Event> events = allSeries.stream()
.flatMap(s -> s.getEvents().stream()).collect(Collectors.toCollection(TreeSet::new));
Map<LocalDate, Double> sumsByDate = events.stream().
map(event -> new AbstractMap.SimpleEntry<>(event.getDate(),
allSeries.stream().mapToDouble(a -> a.getEvents().floor(event).getValue())
.sum())).
filter(p -> !p.getKey().equals(Timeseries.ZERO.getDate())).
collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
答案 2 :(得分:0)
所以我设法用流做部分。虽然您在getRelevantValueFor
方法中进行了大量重复排序,但它看起来并不特别有效。我希望有一个更有效的解决方案。
public Timeseries combine(List<Timeseries> allSeries) {
// Get a unique set of all the dates accross all time series
Set<LocalDate> allDates = allSeries.stream().flatMap(t -> t.get().stream()).map(Event::getDate).collect(Collectors.toSet());
Timeseries output = new Timeseries();
// For each date sum up the latest event in each timeseries
allDates.forEach(date -> {
double total = 0;
for(Timeseries series : allSeries) {
total += getRelevantValueFor(series, date).orElse(0.0);
}
output.add(new Event(date, total));
});
return output;
}
private Optional<Double> getRelevantValueFor(Timeseries series, LocalDate date) {
return series.getEvents().stream().filter(event -> !event.getDate().isAfter(date)).max(ascendingOrder()).map(Event::getValue);
}
private Comparator<Event> ascendingOrder() {
return (event1, event2) -> {
long diff = event1.getDate().toEpochMilli() - event2.getDate().toEpochMilli();
if(diff>0) return 1;
if(diff<0) return -1;
return 0;
};
}