垃圾邮件分类器Clojure

时间:2018-10-31 19:46:54

标签: machine-learning clojure

我一直在尝试在Clojure中实现垃圾邮件分类器。我一直在使用的参考书是《集体智慧》。这是训练分类器的训练方法:

(defn train
  [t cat]
 (incc cat)
 (let [ws (keys (getwords t))]
 (for [w ws] (incf w cat))))

这是我编写的sampletrain方法,只是将一些训练数据转储到分类器中,这样我就不必每次都手动训练它。

(defn sampletrain
  []
    (do
       (train "Nobody owns the water." "good")
       (train "the quick rabit jumps fences" "good")
       (train "buy pharmaceuticals now" "bad")
       (train "make quick money at the online casino" "bad")
       (train "the quick brown fox jumps" "good")))

不幸的是,sampletrain方法仅用分类为“好”的最后一项或句子“褐狐快跳”训练我的分类器。最后,我的分类器如下所示: {“ the” {“ good” 1},“ quick” {“ goood” 1},“ brown” {“ good” 1},“ fox” {“ good” 1},“跳转” {“ good” 1} }。如您所见,它仅受最后一项训练。为了避免这种情况,我用“ do”语句包装了所有内容,但我不知道为什么只执行了最后一次“ train”方法的调用。

1 个答案:

答案 0 :(得分:3)

Clojure使用隐式返回,do语句也使用隐式返回,因此对每个句子都调用train,但是您只返回最后一个求值表达式的值。您可以将其包装在结构中以返回所有它们。

结果包装在向量中:

(defn sampletrain
  []
  [(train "Nobody owns the water." "good")
   (train "the quick rabit jumps fences" "good")
   (train "buy pharmaceuticals now" "bad")
   (train "make quick money at the online casino" "bad")
   (train "the quick brown fox jumps" "good")])