使用clj-xpath用任意标签解析clojure中的xml

时间:2014-05-28 19:21:21

标签: xml xpath clojure

我正在尝试使用clj-xpath解析一些xml,基本上我想创建一个看起来像这样的函数

(map
         (fn [item]
           {:title ($x:text "./title" item)
            :url  ($x:text "./url" item)})
         (take 5
               ($x "/search/events/event" (xmldoc))))

但是使用任意标签。 到目前为止,我有这个

ns mashup-dsl.datamodel
(:use
    [clj-xpath.core])
(def data-url "http://api.eventful.com/rest/events/search?  app_key=4H4Vff4PdrTGp3vV&keywords=music&location=Belgrade&date=Future")

(def events-xml
 (fn [] (slurp data-url)))

(def xmldoc
  (fn [] (xml->doc (events-xml))))

(def item (take 5 ($x "/search/events/event" (xmldoc))))

(defn create-xpath [tag] (str "./" tag))

(def tags ["title" "url"])

(defn parse [item]
    (doseq [tag tags])(into {} (keyword tag) ($x:text (create-xpath tag) item)))

但是我收到了这个错误,TransformerException额外的非法令牌:'$','tag','@','64516c52'org.apache.xpath.compiler.XPathParser.error(XPathParser.java:610)。所以问题在于解析功能。有什么想法吗?

2 个答案:

答案 0 :(得分:3)

以下是如何提取前5个标题:

user=> (map #($x:text "./title" %) (take 5 ($x "//event" (xmldoc))))
("9th International Belgrade Early Music Festival" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\"" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\"" "ICTM Study Group on Music and Dance in Southeastern Europe Conference" "New Belgrade Opera, Madlenianum Opera-Theatre, New Trinity Baroque; Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'incoronazione di Poppea\"")

您的示例doseq未正确关闭,您需要编译表达式以用于xml->doc结果。

您可以创建一个辅助函数,它将返回从标记中提取文本的函数:

(defn tag-fn [tag] (partial $x:text tag))

现在,您可以为" title"生成功能。和" url":

user=> (tag-fn "title")
#<core$partial$fn__4190 clojure.core$partial$fn__4190@71cc2b7a>

user=> (map (tag-fn "title") (take 5 ($x "//event" (xmldoc))))
("9th International Belgrade Early Music Festival" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\"" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\"" "ICTM Study Group on Music and Dance in Southeastern Europe Conference" "New Belgrade Opera, Madlenianum Opera-Theatre, New Trinity Baroque; Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'incoronazione di Poppea\"")

或网址和标题:

user=> (map (juxt (tag-fn "url") (tag-fn "title")) (take 2 ($x "//event" (xmldoc))))
(["http://eventful.com/belgrade/events/9th-international-belgrade-/E0-001-064654999-7@2014061420?utm_source=apis&utm_medium=apim&utm_campaign=apic" "9th International Belgrade Early Music Festival"] ["http://eventful.com/belgrade/events/belgrade-baroque-academy-mijanovic-gosta-9th-belg-/E0-001-059734872-8?utm_source=apis&utm_medium=apim&utm_campaign=apic" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\""])

或网址和标题:

user=> (map (apply juxt (map tag-fn ["url" "title"])) (take 2 ($x "//event" (xmldoc))))
(["http://eventful.com/belgrade/events/9th-international-belgrade-/E0-001-064654999-7@2014061420?utm_source=apis&utm_medium=apim&utm_campaign=apic" "9th International Belgrade Early Music Festival"] ["http://eventful.com/belgrade/events/belgrade-baroque-academy-mijanovic-gosta-9th-belg-/E0-001-059734871-9?utm_source=apis&utm_medium=apim&utm_campaign=apic" "Belgrade Baroque Academy, Mijanovic, Gosta / 9th Belgrade Early Music Festival / Monteverdi: \"L'Incoronazione di Poppea\""])

答案 1 :(得分:3)

最简单的形式是:

  (def url 
      (str 
          "http://api.eventful.com/rest/events/search?"
          "app_key=4H4Vff4PdrTGp3vV&"
          "keywords=music&"
          "location=Tokyo&"
          "date=Future"))
  (def xml (slurp url))
  (def event-titles (map #($x:text "./title" %) ($x "//event" xml)))

事件标题的打印输出将是:

  

(&#34; FLOPPY 10周年纪念「这是电脑音乐'&#34;&#34; IN BUSINESS&#34;   &#34; UNIT 10周年勃起&#34; &#34;在0&#34; &#34; \&#34; 20140530 - 生病   团队发布党\#34;&#34; &#34; Fanfare Ciocarlia @ World Beat Festival&#34;   &#34; Fanfare Ciocarlia @ Musashino Hall&#34; &#34; DBS呈现PINCH生日   击!!!&#34; &#34;布鲁斯姐妹(来自尊重)&#34; &#34; UNIST第二名   专辑「Acoustic」リリースパーティー「リリースしちゃってウカれ夜(ドヤッ)☆」&#34;)

修改 对于多功能功能,您可以定义:

(defn search-for [tag local-path]
  (map #($x:text (str (local-path) %) ($x (str "//" tag) *xml*)))

并使用它:

 (search-for "event" "@id")

 (search-for "event" "./title")

 (search-for "image" "./url")