Question

我评估以下代码

(org.httpkit.client/get "http://localhost:81"
                    #(clojure.pprint/pprint (.getBytes (:body %))))

打印

[-17, -65, -67, -17, -65, -67]

如果index.html在CP1251中，并且

[-48, -80, -48, -79, -48, -78]

如果同一文件是UTF-8。

俄罗斯的index.html内容

абв

http-kit将响应体返回为UTF-8编码的String对象，但它并不考虑HTML文档的实际字符集。这导致身体中的垃圾像

"<html>�����</html>"

如何让org.httpkit.client / get来查看文档的字符集？

Answer 1

您可以使用带有特定选项的org.httpkit.client.request来获取正文的原始字节。

如果文档采用CP1251编码，则以下代码会打印正确的正文内容。

(org.httpkit.client/request {:url "http://localhost:81" :as :byte-array} 
                            #(println (String. (:body %) "cp1251")))