汇总(分组和计数)一系列地图

时间:2014-06-26 09:39:39

标签: clojure

我试图在Clojure中找到一种通过某些键对一系列地图进行分组并提供计数的惯用方法。类似于' SELECT X,Y,COUNT(*)FROM Z GROUP BY X,Y'在SQL中。数据如下所示:

({:status "Academy Sponsor Led",
  :pupil-population "",
  :locality "Northamptonshire",
  :pupil-gender "Mixed",
  :county "Northamptonshire",
  :pupil-age "11-18",
  :school "Wrenn School",
  :website ""}
 {:status "Academy Sponsor Led",
  :pupil-population "915",
  :locality "Plymouth",
  :pupil-gender "Mixed",
  :county "Devon",
  :pupil-age "11-19",
  :school "The All Saints Church of England Academy",
  :website "http://www.asap.org.uk/"}
 {:status "Academy Converter",
  :pupil-population "735",
  :locality "Somerset",
  :pupil-gender "Mixed",
  :county "Somerset",
  :pupil-age "11-16",
  :school "Stanchester Academy",
  :website "www.Stanchester-Academy.co.uk"}
 {:status "Community School",
  :pupil-population "",
  :locality "Herefordshire",
  :pupil-gender "Mixed",
  :county "Herefordshire",
  :pupil-age "11-18",
  :school "Lady Hawkins High School",
  :website "http://www.lhs.hereford.sch.uk"}...

我的解决方案如下:

(defn summarise-locality-status
  "Return counts of status within locality"
  [data]
  (let [locality (group-by :locality data)
        locality-status (map #(vector (first %) (group-by :status (second %))) locality)
        counts-fn (fn [locality-status-item]
                    (let [statuses (second locality-status-item)]
                      (map #(vector % (count (get statuses %))) (keys statuses))))]
    (map #(vector (first %) (counts-fn %)) locality-status)))

然而感觉有点笨重。有什么比这更好的方法呢?

4 个答案:

答案 0 :(得分:5)

根据您的需要,

(frequencies (for [r data] (select-keys r [:locality :status])))

更接近SQL,因为它不是嵌套的。

答案 1 :(得分:4)

另一种解决方案,介绍juxtreduce-kv

(->> data
     (group-by (juxt :locality :status))
     (reduce-kv #(assoc-in % %2 (count %3)) {}))

这可能最接近原始SQL,更直观易懂。

答案 2 :(得分:2)

怎么样

(reduce #(update-in %1 [(:locality %2) (:status %2)] (fnil inc 0)) {} data)

(reduce #(update-in %1 ((juxt :locality :status) %2) (fnil inc 0)) {} data) 

输出略有不同(哈希映射而不是列表),但这很容易改变。使用哈希映射会使group-by变得多余,代码会更短/更容易。

答案 3 :(得分:1)

(for [[locality statuses] (group-by :locality data)]
  {:locality locality :all_status
   (for [[status items] (group-by :status statuses)]
      {:status status :count (count items)})})