如何在enlive中解析来自httpclient的结果

时间:2016-01-24 12:50:01

标签: clojure enlive

在以下链接中 https://github.com/swannodette/enlive-tutorial/blob/master/src/tutorial/scrape1.clj

它显示了如何从URL解析页面,但我需要使用sock5代理,我无法弄清楚如何在enlive中使用代理,但我知道如何在httpclient中使用代理,但是如何从httpclient解析结果,我有以下代码,但最后一行显示空结果

    (:require [clojure.set :as set]
                [clj-http.client :as client]
                [clj-http.conn-mgr :as conn-mgr]
                [clj-time.core :as time]
                [jsoup.soup :as soup]
                [clj-time.coerce :as tc]
                [net.cgrand.enlive-html :as html]
                )     
     (def a (client/get "https://news.ycombinator.com/"
                             {:connection-manager (conn-mgr/make-socks-proxied-conn-manager "127.0.0.1" 9150)
                              :socket-timeout 10000 :conn-timeout 10000
                              :client-params {"http.useragent" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.672.2 Safari/534.20"}}))
(def b (html/html-resource a))
(html/select b [:td.title :a])

1 个答案:

答案 0 :(得分:1)

使用enlive时,html-resource fn从URL执行提取,然后将其转换为可以解析的数据结构。似乎当你传递一个已经完成的请求时,它只返回请求而不是抛出错误。

无论哪种方式,您想要的功能都是html-snippet,并且您希望将其传递给您的请求正文。像这样:

;; Does not matter if you are using a connection manager or not as long as
;; its returning a result with a body
(def req (client/get "https://news.ycombinator.com/"))

(def body (:body req))
(def nodes (html/html-snippet body))
(html/select nodes [:td.title :a])

;; Or you can put it all together like this

(-> req
    :body 
    html/html-snippet
    (html/select [:td.title :a])))