执行"得到"在LazySeq的所有HashMap元素上

时间:2016-08-28 22:00:22

标签: clojure hashmap lazy-sequences

我使用clojure.data.xml从Stack Exchange解析一些XML数据,例如,如果我解析Votes数据,它会为每行数据返回一个包含HashMap的LazySeq。

我要做的是获取与每行仅某些键相关联的值,例如(get votes [:Id :CreationDate])。我尝试了很多东西,其中大部分会导致出错。

我能得到的最接近的是使用(doall (map get votes [:Id :CreationDate]))。但是,我现在遇到的问题是,我似乎无法返回的不仅仅是第一行(即(1 2011-01-19T00:00:00.000)

这是一个可以在任何Clojure REPL或on Codepad online IDE上运行的MCVE。

理想情况下,我想返回某种包含每行所需值的集合或映射,最终目标是写入类似CSV文件的内容。例如像

这样的地图
(1 2011-01-19T00:00:00.000
 2 2011-01-19T00:00:00.000
 3 2011-01-19T00:00:00.000
 4 2011-01-19T00:00:00.000)
(def votes '({:Id "1",
              :PostId "2",
              :VoteTypeId "2",
              :CreationDate "2011-01-19T00:00:00.000"}
             {:Id "2",
              :PostId "3",
              :VoteTypeId "2",
              :CreationDate "2011-01-19T00:00:00.000"}
             {:Id "3",
              :PostId "1",
              :VoteTypeId "2",
              :CreationDate "2011-01-19T00:00:00.000"}
             {:Id "4",
              :PostId "1",
              :VoteTypeId "2",
              :CreationDate "2011-01-19T00:00:00.000"}))

  (println (doall (map get votes [:Id :CreationDate])))

其他细节:如果这有任何帮助/兴趣,我用来获得上述懒惰seq的代码如下:

(ns se-datadump.read-xml
  (require
    [clojure.data.xml :as xml])

(def xml-votes
  "<votes><row Id=\"1\" PostId=\"2\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" />  <row Id=\"2\" PostId=\"3\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" />  <row Id=\"3\" PostId=\"1\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" />  <row Id=\"4\" PostId=\"1\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /></votes>")

(defn se-xml->rows-seq
  "Returns LazySequence from a properly formatted XML string,
  which contains a HashMap for every <row> element with each of its attributes.
  This assumes the standard Stack Exchange XML format, where a parent element contains
  only a series of <row> child elements with no further hierarchy."
  [xml-str]
  (let [xml-records (xml/parse-str xml-str)]
        (map :attrs (-> xml-records :content))))

; this returns a map identical as in the MCVE:
(def votes (se-xml->rows-seq xml-votes)

2 个答案:

答案 0 :(得分:3)

您显然需要juxt

(map (juxt :Id :CreationDate) votes)
;; => (["1" "2011-01-19T00:00:00.000"] ["2" "2011-01-19T00:00:00.000"] ["3" "2011-01-19T00:00:00.000"] ["4" "2011-01-19T00:00:00.000"])

如果你需要一张地图:

(into {} (map (juxt :Id :CreationDate) votes))
;; => {"1" "2011-01-19T00:00:00.000", "2" "2011-01-19T00:00:00.000", "3" "2011-01-19T00:00:00.000", "4" "2011-01-19T00:00:00.000"}

答案 1 :(得分:2)

首先,让我解释一下,你在CodePad中建议的代码实际上是做什么的。我怀疑这是你打算做的事情:

(println (doall (map get votes [:Id :CreationDate])))

关键部分是:(map get votes [:Id :CreationDate]) 这映射了两个集合:懒惰序列'投票'和矢量。每当映射多个集合时,返回的延迟序列将与提供的最短集合一样长 例如,可以映射有限集合和无限序列:

(map + (range) [1 2 3])
;; (0 3 5)

这解释了为什么你的结果只有两个项目:

(map get votes [:Id :CreationDate])

缩减为:

((get (votes 0) ([:Id :CreationDate] 0)
 (get (votes 1) ([:Id :CreationDate] 1))

缩减为:

((get {:Id "1",
       :PostId "2",
       :VoteTypeId "2",
       :CreationDate "2011-01-19T00:00:00.000"} :Id)
 (get {:Id "2",
       :PostId "3",
       :VoteTypeId "2",
       :CreationDate "2011-01-19T00:00:00.000"} :CreationDate))

最终减少到:

(1 2011-01-19T00:00:00.000)

这只是为了理解目的。如果编译器完全执行这些步骤,则是另一个问题。

doall在这里不是必需的,因为println已经隐含地这样做了。

如前所述。在您的情况下,您最好使用juxt并仅映射投票。如果你真的想要样本输出,你还需要压平输出:

(flatten (map (juxt :Id :CreationDate) votes))