我有一个dict,类似于{:datetime [unix-timestamp] :count [longs]}
。
:datetime
和:count
中有相同数量的内容。
:datetime
没有指定的间隔,通常是滴答数据。我想重新采样数据,以便它们具有定义的间隔,例如5分钟,并总结范围的:count
。
示例:
{
:datetime [timestamp every minute]
:count [1 1 1 1 1. . .]
}
将其重新取样为
{
:datetime [timestamp every 5 minutes]
:count [5 5 5 5 5 ...]
}
答案 0 :(得分:0)
您希望从时间戳向量中取五个元素中的一个元素,并从计数向量中添加五个计数组。这样的事情会做到:
(defn resample [m]
(let [{dt :datetime ct :count} m
newdt (map first (partition 5 dt))
newct (map (partial apply +) (partition 5 ct))]
{:datetime newdt
:count newct}))
答案 1 :(得分:0)
这里有些奇特,但可能效率低下:
(defn resample-5 [{:keys [datetime count]}]
(letfn [(floor-5 [dt] (- dt (mod dt (* 5 60 1000))))
(sum-counts [[time pairs]]
[time (reduce + (map second pairs))])]
(let [pairs (partition 2 (interleave datetime count))
pair-groups (group-by #(floor-5 (first %)) pairs)
sums (map sum-counts pair-groups)]
{:datetime (map first sums)
:count (map second sums)})))
请注意它对集合执行了多少操作:interleave
,partition
,group-by
,map
+ reduce
,以及map
两次。
这里的效率更高,只扫描一次收集:
(defn resample-5 [{:keys [datetime count]}]
(letfn [(add-tick [result dt c]
(if dt
(-> result
(update-in [:datetime] conj dt)
(update-in [:count] conj c))
result))]
(loop [datetimes datetime
counts count
rounded-last nil
count-last 0
result {:datetime [] :count []}]
(if (empty? datetimes)
(add-tick result rounded-last count-last)
(let [dt (first datetimes)
c (first counts)
rounded (- dt (mod dt (* 5 60 1000)))]
(if (= rounded-last rounded)
(recur (rest datetimes) (rest counts) rounded (+ count-last c) result)
(recur (rest datetimes) (rest counts) rounded c (add-tick result rounded-last count-last))))))))