时间序列重新取样

时间:2014-07-06 20:37:49

标签: clojure time-series data-analysis

我有一个dict,类似于{:datetime [unix-timestamp] :count [longs]}

:datetime:count中有相同数量的内容。

:datetime没有指定的间隔,通常是滴答数据。我想重新采样数据,以便它们具有定义的间隔,例如5分钟,并总结范围的:count

示例:

{
   :datetime [timestamp every minute] 
   :count [1 1 1 1 1. . .] 
} 

将其重新取样为

{
   :datetime [timestamp every 5 minutes] 
   :count [5 5 5 5 5 ...] 
}

2 个答案:

答案 0 :(得分:0)

您希望从时间戳向量中取五个元素中的一个元素,并从计数向量中添加五个计数组。这样的事情会做到:

(defn resample [m]
  (let [{dt :datetime ct :count} m
        newdt (map first (partition 5 dt))
        newct (map (partial apply +) (partition 5 ct))]
    {:datetime newdt
     :count newct}))

答案 1 :(得分:0)

这里有些奇特,但可能效率低下:

(defn resample-5 [{:keys [datetime count]}]
  (letfn [(floor-5 [dt] (- dt (mod dt (* 5 60 1000))))
          (sum-counts [[time pairs]]
                      [time (reduce + (map second pairs))])]
    (let [pairs (partition 2 (interleave datetime count))
          pair-groups (group-by #(floor-5 (first %)) pairs)
          sums (map sum-counts pair-groups)]
      {:datetime (map first sums)
       :count (map second sums)})))

请注意它对集合执行了多少操作:interleavepartitiongroup-bymap + reduce,以及map两次。

这里的效率更高,只扫描一次收集:

(defn resample-5 [{:keys [datetime count]}]
  (letfn [(add-tick [result dt c]
                    (if dt
                      (-> result
                          (update-in [:datetime] conj dt)
                          (update-in [:count] conj c))
                      result))]

    (loop [datetimes datetime
           counts count
           rounded-last nil
           count-last 0
           result {:datetime [] :count []}]
      (if (empty? datetimes)
        (add-tick result rounded-last count-last)
        (let [dt (first datetimes)
              c (first counts)
              rounded (- dt (mod dt (* 5 60 1000)))]
          (if (= rounded-last rounded)
            (recur (rest datetimes) (rest counts) rounded (+ count-last c) result)
            (recur (rest datetimes) (rest counts) rounded c (add-tick result rounded-last count-last))))))))