Clojure:Enlive选择器内的自定义功能?

时间:2013-08-30 12:58:32

标签: clojure enlive

以下是我在选择器向量中直接使用html/text的示例。

(:use [net.cgrand.enlive-html :as html])

(defn fetch-url [url]
  (html/html-resource (java.net.URL. url)))

(defn parse-test []
  (html/select 
   (fetch-url "https://news.ycombinator.com/") 
   [:td.title :a html/text]))

调用(parse-test)会返回包含黑客新闻标题的数据结构:

("In emergency cases a passenger was selected and thrown out of the plane. [2004]" 
 "“Nobody expects privacy online”: Wrong." 
 "The SCUMM Diary: Stories behind one of the greatest game engines ever made" ...)

酷!

是否可以使用自定义函数结束选择器向量,该函数可以返回文章URL列表。

类似于:[:td.title :a #(str "https://news.ycombinator.com/" (:href (:attrs %)))]

修改

这是实现这一目标的一种方法。我们可以编写自己的select函数:

(defn select+ [coll selector+]
   (map
     (peek selector+)
     (html/select 
       (fetch-url "https://news.ycombinator.com/") 
       (pop selector+))))

(def href
  (fn [node] (:href (:attrs node))))

(defn parse-test []
  (select+ 
   (fetch-url "https://news.ycombinator.com/") 
   [:td.title :a href]))

(parse-test)

1 个答案:

答案 0 :(得分:2)

正如您在评论中所建议的那样,我认为将节点的选择和转换分开是最清晰的。

Enlive本身提供选择器变换器。用于查找节点的选择器和用于转换它们的变换器。如果您的预期输出是html,您可以使用选择器和变换器的组合来实现您想要的结果。

然而,看到你正在寻找数据(可能是一系列地图?) - 你可以跳过变换位,只使用序列理解,如下所示:

(defn parse-test []
  (for [s (html/select 
            (fetch-url "https://news.ycombinator.com/") 
              [:td.title :a])]
    {:title (first (:content s))
     :link  (:href (:attrs s))}))

(take 2 (parse-test))
;; => ({:title " \tStartup - Bill Watterson, a cartoonist's advice ",
        :link "http://www.zenpencils.com/comic/128-bill-watterson-a-cartoonists-advice"} 
       {:title "Drug Agents Use Vast Phone Trove Eclipsing N.S.A.’s",
        :link "http://www.nytimes.com/2013/09/02/us/drug-agents-use-vast-phone-trove-eclipsing-nsas.html?hp&_r=0&pagewanted=all"})