我使用clojure.data.xml
从Stack Exchange解析一些XML数据,例如,如果我解析Votes数据,它会为每行数据返回一个包含HashMap的LazySeq。
我要做的是获取与每行仅某些键相关联的值,例如(get votes [:Id :CreationDate])
。我尝试了很多东西,其中大部分会导致出错。
我能得到的最接近的是使用(doall (map get votes [:Id :CreationDate]))
。但是,我现在遇到的问题是,我似乎无法返回的不仅仅是第一行(即(1 2011-01-19T00:00:00.000)
)
这是一个可以在任何Clojure REPL或on Codepad online IDE上运行的MCVE。
理想情况下,我想返回某种包含每行所需值的集合或映射,最终目标是写入类似CSV文件的内容。例如像
这样的地图(1 2011-01-19T00:00:00.000 2 2011-01-19T00:00:00.000 3 2011-01-19T00:00:00.000 4 2011-01-19T00:00:00.000)
(def votes '({:Id "1",
:PostId "2",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"}
{:Id "2",
:PostId "3",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"}
{:Id "3",
:PostId "1",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"}
{:Id "4",
:PostId "1",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"}))
(println (doall (map get votes [:Id :CreationDate])))
其他细节:如果这有任何帮助/兴趣,我用来获得上述懒惰seq的代码如下:
(ns se-datadump.read-xml
(require
[clojure.data.xml :as xml])
(def xml-votes
"<votes><row Id=\"1\" PostId=\"2\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /> <row Id=\"2\" PostId=\"3\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /> <row Id=\"3\" PostId=\"1\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /> <row Id=\"4\" PostId=\"1\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /></votes>")
(defn se-xml->rows-seq
"Returns LazySequence from a properly formatted XML string,
which contains a HashMap for every <row> element with each of its attributes.
This assumes the standard Stack Exchange XML format, where a parent element contains
only a series of <row> child elements with no further hierarchy."
[xml-str]
(let [xml-records (xml/parse-str xml-str)]
(map :attrs (-> xml-records :content))))
; this returns a map identical as in the MCVE:
(def votes (se-xml->rows-seq xml-votes)
答案 0 :(得分:3)
您显然需要juxt
:
(map (juxt :Id :CreationDate) votes)
;; => (["1" "2011-01-19T00:00:00.000"] ["2" "2011-01-19T00:00:00.000"] ["3" "2011-01-19T00:00:00.000"] ["4" "2011-01-19T00:00:00.000"])
如果你需要一张地图:
(into {} (map (juxt :Id :CreationDate) votes))
;; => {"1" "2011-01-19T00:00:00.000", "2" "2011-01-19T00:00:00.000", "3" "2011-01-19T00:00:00.000", "4" "2011-01-19T00:00:00.000"}
答案 1 :(得分:2)
首先,让我解释一下,你在CodePad中建议的代码实际上是做什么的。我怀疑这是你打算做的事情:
(println (doall (map get votes [:Id :CreationDate])))
关键部分是:(map get votes [:Id :CreationDate])
这映射了两个集合:懒惰序列'投票'和矢量。每当映射多个集合时,返回的延迟序列将与提供的最短集合一样长
例如,可以映射有限集合和无限序列:
(map + (range) [1 2 3])
;; (0 3 5)
这解释了为什么你的结果只有两个项目:
(map get votes [:Id :CreationDate])
缩减为:
((get (votes 0) ([:Id :CreationDate] 0)
(get (votes 1) ([:Id :CreationDate] 1))
缩减为:
((get {:Id "1",
:PostId "2",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"} :Id)
(get {:Id "2",
:PostId "3",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"} :CreationDate))
最终减少到:
(1 2011-01-19T00:00:00.000)
这只是为了理解目的。如果编译器完全执行这些步骤,则是另一个问题。
doall
在这里不是必需的,因为println
已经隐含地这样做了。
如前所述。在您的情况下,您最好使用juxt
并仅映射投票。如果你真的想要样本输出,你还需要压平输出:
(flatten (map (juxt :Id :CreationDate) votes))