我有以下示例xml:
<data>
<products>
<product>
<section>Red Section</section>
<images>
<image>img.jpg</image>
<image>img2.jpg</image>
</images>
</product>
<product>
<section>Blue Section</section>
<images>
<image>img.jpg</image>
<image>img3.jpg</image>
</images>
</product>
<product>
<section>Green Section</section>
<images>
<image>img.jpg</image>
<image>img2.jpg</image>
</images>
</product>
</products>
</data>
我知道如何在Clojure中解析它
(require '[clojure.xml :as xml])
(def x (xml/parse 'location/of/that/xml'))
这将返回描述xml
的嵌套映射{:tag :data,
:attrs nil,
:content [
{:tag :products,
:attrs nil,
:content [
{:tag :product,
:attrs nil,
:content [] ..
这个结构当然可以使用标准的Clojure函数遍历,但它可能会变得非常冗长,特别是与例如使用XPath查询它时相比。是否有任何助手可以遍历和搜索这样的结构?我怎么能,例如
<product>
<images>
标记包含<image>
且文字为“img2.jpg”的产品section
为“红色部分”的产品由于
答案 0 :(得分:9)
使用Zippers中的data.zip这是第二个用例的解决方案:
(ns core
(:use clojure.data.zip.xml)
(:require [clojure.zip :as zip]
[clojure.xml :as xml]))
(def data (zip/xml-zip (xml/parse PATH)))
(def products (xml-> data :products :product))
(for [product products :let [image (xml-> product :images :image)]
:when (some (text= "img2.jpg") image)]
{:section (xml1-> product :section text)
:images (map text image)})
=> ({:section "Red Section", :images ("img.jpg" "img2.jpg")}
{:section "Green Section", :images ("img.jpg" "img2.jpg")})
答案 1 :(得分:4)
以下是使用data.zip的备用版本,适用于所有三个用例。我发现xml->
和xml1->
内置了非常强大的导航功能,并在向量中进行了子查询。
;; [org.clojure/data.zip "0.1.1"]
(ns example.core
(:require
[clojure.zip :as zip]
[clojure.xml :as xml]
[clojure.data.zip.xml :refer [text xml-> xml1->]]))
(def data (zip/xml-zip (xml/parse "/tmp/products.xml")))
(let [all-products (xml-> data :products :product)
red-section (xml1-> data :products :product [:section "Red Section"])
img2 (xml-> data :products :product [:images [:image "img2.jpg"]])]
{:all-products (map (fn [product] (xml1-> product :section text)) all-products)
:red-section (xml1-> red-section :section text)
:img2 (map (fn [product] (xml1-> product :section text)) img2)})
=> {:all-products ("Red Section" "Blue Section" "Green Section"),
:red-section "Red Section",
:img2 ("Red Section" "Green Section")}
答案 2 :(得分:3)
您可以使用clj-xpath
答案 3 :(得分:1)
The Tupelo library可以使用tupelo.forest
树数据结构轻松解决此类问题。请see this question for more information。 API文档can be found here。
这里我们加载你的xml数据并将其首先转换为enlive,然后转换为tupelo.forest
使用的本机树结构。 Libs&amp;数据def:
(ns tst.tupelo.forest-examples
(:use tupelo.forest tupelo.test )
(:require
[clojure.data.xml :as dx]
[clojure.java.io :as io]
[clojure.set :as cs]
[net.cgrand.enlive-html :as en-html]
[schema.core :as s]
[tupelo.core :as t]
[tupelo.string :as ts]))
(t/refer-tupelo)
(def xml-str-prod "<data>
<products>
<product>
<section>Red Section</section>
<images>
<image>img.jpg</image>
<image>img2.jpg</image>
</images>
</product>
<product>
<section>Blue Section</section>
<images>
<image>img.jpg</image>
<image>img3.jpg</image>
</images>
</product>
<product>
<section>Green Section</section>
<images>
<image>img.jpg</image>
<image>img2.jpg</image>
</images>
</product>
</products>
</data> " )
和初始化代码:
(dotest
(with-forest (new-forest)
(let [enlive-tree (->> xml-str-prod
java.io.StringReader.
en-html/html-resource
first)
root-hid (add-tree-enlive enlive-tree)
tree-1 (hid->hiccup root-hid)
hid后缀代表“Hex ID”,它是唯一的十六进制值,就像一个指向树中节点/叶子的指针。在这个阶段,我们刚刚将数据加载到林数据结构中,创建了树形图1,它看起来像:
[:data
[:tupelo.forest/raw "\n "]
[:products
[:tupelo.forest/raw "\n "]
[:product
[:tupelo.forest/raw "\n "]
[:section "Red Section"]
[:tupelo.forest/raw "\n "]
[:images
[:tupelo.forest/raw "\n "]
[:image "img.jpg"]
[:tupelo.forest/raw "\n "]
[:image "img2.jpg"]
[:tupelo.forest/raw "\n "]]
[:tupelo.forest/raw "\n "]]
[:tupelo.forest/raw "\n "]
[:product
[:tupelo.forest/raw "\n "]
[:section "Blue Section"]
[:tupelo.forest/raw "\n "]
[:images
[:tupelo.forest/raw "\n "]
[:image "img.jpg"]
[:tupelo.forest/raw "\n "]
[:image "img3.jpg"]
[:tupelo.forest/raw "\n "]]
[:tupelo.forest/raw "\n "]]
[:tupelo.forest/raw "\n "]
[:product
[:tupelo.forest/raw "\n "]
[:section "Green Section"]
[:tupelo.forest/raw "\n "]
[:images
[:tupelo.forest/raw "\n "]
[:image "img.jpg"]
[:tupelo.forest/raw "\n "]
[:image "img2.jpg"]
[:tupelo.forest/raw "\n "]]
[:tupelo.forest/raw "\n "]]
[:tupelo.forest/raw "\n "]]
[:tupelo.forest/raw "\n "]]
我们接下来用以下代码删除所有空白字符串:
blank-leaf-hid? (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node
(let [value (hid->value hid)]
(and (string? value)
(or (zero? (count value)) ; empty string
(ts/whitespace? value)))))) ; all whitespace string
blank-leaf-hids (keep-if blank-leaf-hid? (all-hids))
>> (apply remove-hid blank-leaf-hids)
tree-2 (hid->hiccup root-hid)
生成更好的结果树(打嗝格式)
[:data
[:products
[:product
[:section "Red Section"]
[:images [:image "img.jpg"] [:image "img2.jpg"]]]
[:product
[:section "Blue Section"]
[:images [:image "img.jpg"] [:image "img3.jpg"]]]
[:product
[:section "Green Section"]
[:images [:image "img.jpg"] [:image "img2.jpg"]]]]]
以下代码然后计算上述三个问题的答案:
product-hids (find-hids root-hid [:** :product])
product-trees-hiccup (mapv hid->hiccup product-hids)
img2-paths (find-paths-leaf root-hid [:data :products :product :images :image] "img2.jpg")
img2-prod-paths (mapv #(drop-last 2 %) img2-paths)
img2-prod-hids (mapv last img2-prod-paths)
img2-trees-hiccup (mapv hid->hiccup img2-prod-hids)
red-sect-paths (find-paths-leaf root-hid [:data :products :product :section] "Red Section")
red-prod-paths (mapv #(drop-last 1 %) red-sect-paths)
red-prod-hids (mapv last red-prod-paths)
red-trees-hiccup (mapv hid->hiccup red-prod-hids)]
结果:
(is= product-trees-hiccup
[[:product
[:section "Red Section"]
[:images
[:image "img.jpg"]
[:image "img2.jpg"]]]
[:product
[:section "Blue Section"]
[:images
[:image "img.jpg"]
[:image "img3.jpg"]]]
[:product
[:section "Green Section"]
[:images
[:image "img.jpg"]
[:image "img2.jpg"]]]] )
(is= img2-trees-hiccup
[[:product
[:section "Red Section"]
[:images
[:image "img.jpg"]
[:image "img2.jpg"]]]
[:product
[:section "Green Section"]
[:images
[:image "img.jpg"]
[:image "img2.jpg"]]]])
(is= red-trees-hiccup
[[:product
[:section "Red Section"]
[:images
[:image "img.jpg"]
[:image "img2.jpg"]]]]))))
可以找到完整示例in the forest-examples unit test。
答案 4 :(得分:0)
在许多情况下,线程优先宏以及clojures映射和向量语义是访问xml的适当语法。在许多情况下,您需要更具体的xml(如xpath库),但在许多情况下,现有语言几乎同样简洁,不添加任何依赖项。
(pprint (-> (xml/parse "/tmp/xml")
:content first :content second :content first :content first))
"Blue Section"