在以下链接中 https://github.com/swannodette/enlive-tutorial/blob/master/src/tutorial/scrape1.clj
它显示了如何从URL解析页面,但我需要使用sock5代理,我无法弄清楚如何在enlive中使用代理,但我知道如何在httpclient中使用代理,但是如何从httpclient解析结果,我有以下代码,但最后一行显示空结果
(:require [clojure.set :as set]
[clj-http.client :as client]
[clj-http.conn-mgr :as conn-mgr]
[clj-time.core :as time]
[jsoup.soup :as soup]
[clj-time.coerce :as tc]
[net.cgrand.enlive-html :as html]
)
(def a (client/get "https://news.ycombinator.com/"
{:connection-manager (conn-mgr/make-socks-proxied-conn-manager "127.0.0.1" 9150)
:socket-timeout 10000 :conn-timeout 10000
:client-params {"http.useragent" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.672.2 Safari/534.20"}}))
(def b (html/html-resource a))
(html/select b [:td.title :a])
答案 0 :(得分:1)
使用enlive时,html-resource
fn从URL执行提取,然后将其转换为可以解析的数据结构。似乎当你传递一个已经完成的请求时,它只返回请求而不是抛出错误。
无论哪种方式,您想要的功能都是html-snippet
,并且您希望将其传递给您的请求正文。像这样:
;; Does not matter if you are using a connection manager or not as long as
;; its returning a result with a body
(def req (client/get "https://news.ycombinator.com/"))
(def body (:body req))
(def nodes (html/html-snippet body))
(html/select nodes [:td.title :a])
;; Or you can put it all together like this
(-> req
:body
html/html-snippet
(html/select [:td.title :a])))