如何根据列值(动态赋予)对嵌套集合进行分组?例如,假设我们有以下嵌套集合;如何按第一列和第二列中的值对其进行分组?
[ ["A" 2011 "Dan"]
["A" 2011 "Jon"]
["A" 2010 "Tim"]
["B" 2009 "Tom"] ]
所需的结果地图是:
{ A {
2011 [['A', 2011, 'Dan'] ['A', 2011, 'Joe']]
2010 [['A', 2010, 'Tim']]
}
B { 2009 [['B', 2009, 'Tom']] }
}
以下是我的解决方案,几乎可行:
(defn nest [data criteria]
(if (empty? criteria)
data
(for [[k v] (group-by #(nth % (-> criteria vals first)) data)]
(hash-map k (nest v (rest criteria))))))
答案 0 :(得分:6)
我想出了以下内容:
user=> (def a [["A" 2011 "Dan"]
["A" 2011 "Jon"]
["A" 2010 "Tim"]
["B" 2009 "Tom"] ])
user=> (into {} (for [[k v] (group-by first a)]
[k (group-by second v)]))
{"A" {2011 [["A" 2011 "Dan"]
["A" 2011 "Jon"]],
2010 [["A" 2010 "Tim"]]},
"B" {2009 [["B" 2009 "Tom"]]}}
答案 1 :(得分:2)
group-by
我需要对group-by
进行概括,以产生超过2个嵌套的地图。我希望能够为这样的函数提供一个任意函数列表,以便通过group-by
递归运行。这就是我想出的:
(defn map-function-on-map-vals
"Take a map and apply a function on its values. From [1].
[1] http://stackoverflow.com/a/1677069/500207"
[m f]
(zipmap (keys m) (map f (vals m))))
(defn nested-group-by
"Like group-by but instead of a single function, this is given a list or vec
of functions to apply recursively via group-by. An optional `final` argument
(defaults to identity) may be given to run on the vector result of the final
group-by."
[fs coll & [final-fn]]
(if (empty? fs)
((or final-fn identity) coll)
(map-function-on-map-vals (group-by (first fs) coll)
#(nested-group-by (rest fs) % final-fn))))
应用于您的数据集:
cljs.user=> (def foo [ ["A" 2011 "Dan"]
#_=> ["A" 2011 "Jon"]
#_=> ["A" 2010 "Tim"]
#_=> ["B" 2009 "Tom"] ])
cljs.user=> (require '[cljs.pprint :refer [pprint]])
nil
cljs.user=> (pprint (nested-group-by [first second] foo))
{"A"
{2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]], 2010 [["A" 2010 "Tim"]]},
"B" {2009 [["B" 2009 "Tom"]]}}
精确生成所需的输出。 nested-group-by
可以使用三个或四个或更多函数,并生成许多哈希映射嵌套。也许这会对其他人有所帮助。
nested-group-by
还有一个方便的额外功能:final-fn
,默认为identity
,所以如果你不提供,最深的嵌套会返回一个值向量,但如果你提供一个final-fn
,它在最里面的向量上运行。举例说明:如果您只是想知道每个类别和年份中出现了多少行原始数据集:
cljs.user=> (nested-group-by [first second] foo count)
#^^^^^ this is final-fn
{"A" {2011 2, 2010 1}, "B" {2009 1}}
此函数不使用recur
,因此深度递归调用可能会破坏堆栈。但是,对于预期的用例,只有少数功能,这应该不是问题。
答案 2 :(得分:1)
这是我提出的解决方案。它有效,但我相信它可以改进。
(defn nest [data criteria]
(if (empty? criteria)
data
(into {} (for [[k v] (group-by #(nth % (-> criteria vals first)) data)]
(hash-map k (nest v (rest criteria)))))))
答案 3 :(得分:1)
我怀疑这是最惯用的版本:
(defn nest-by
[ks coll]
(let [keyfn (apply juxt ks)]
(reduce (fn [m x] (update-in m (keyfn x) (fnil conj []) x)) {} coll)))
这利用了update-in
已经完成您想要的大部分功能这一事实。在您的特定情况下,您只需执行以下操作即可:
(nest-by [first second] [["A" 2011 "Dan"]
["A" 2011 "Jon"]
["A" 2010 "Tim"]
["B" 2009 "Tom"] ])
{"A" {2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]], 2010 [["A" 2010 "Tim"]]}, "B" {2009 [["B" 2009 "Tom"]]}}
答案 4 :(得分:0)
这让你非常接近
(defn my-group [coll]
(let [m (group-by
#(-> % val first first)
(group-by #(second %) coll))]
(into {} (for [[k v] m] [k (#(into {} %) v)]))))
(my-group [["A" 2011 "Dan"] ["A" 2011 "Jon"] ["A" 2010 "Tim"] ["B" 2009 "Tom"]])
{"A" {
2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]],
2010 [["A" 2010 "Tim"]]
},
"B" {2009 [["B" 2009 "Tom"]]}
}
和Clojure一样,你可能会发现一些不那么冗长的东西。