你会使用if / else在Haskell中编写这个算法吗?没有它们有没有办法表达它?很难从具有意义的中间提取功能。这只是机器学习系统的输出。
我正在实施用于将html内容片段分类为内容或Boilerplate描述here的算法。这具有已经硬编码的权重。
curr_linkDensity <= 0.333333
| prev_linkDensity <= 0.555556
| | curr_numWords <= 16
| | | next_numWords <= 15
| | | | prev_numWords <= 4: BOILERPLATE
| | | | prev_numWords > 4: CONTENT
| | | next_numWords > 15: CONTENT
| | curr_numWords > 16: CONTENT
| prev_linkDensity > 0.555556
| | curr_numWords <= 40
| | | next_numWords <= 17: BOILERPLATE
| | | next_numWords > 17: CONTENT
| | curr_numWords > 40: CONTENT
curr_linkDensity > 0.333333: BOILERPLATE
答案 0 :(得分:11)
不是手动简化逻辑(假设您可能会自动生成此代码),我认为使用MultiWayIf
非常简洁直接。
{-# LANGUAGE MultiWayIf #-}
data Stats = Stats {
curr_linkDensity :: Double,
prev_linkDensity :: Double,
...
}
data Classification = Content | Boilerplate
classify :: Stats -> Classification
classify s = if
| curr_linkDensity s <= 0.333333 -> if
| prev_linkDensity s <= 0.555556 -> if
| curr_numWords s <= 16 -> if
| next_numWords s <= 15 -> if
| prev_numWords s <= 4 -> Boilerplate
| prev_numWords s > 4 -> Content
| next_numWords s > 16 -> Content
...
等等。
但是,由于这是如此结构化 - 只是if / else的树与比较,也考虑创建决策树数据结构并为其编写解释器。这将允许您进行转换,操作,检查。也许它会给你买点东西;为您的规范定义微型语言可能会令人惊讶地受益。
data DecisionTree i o
= Comparison (i -> Double) Double (DecisionTree i o) (DecisionTree i o)
| Leaf o
runDecisionTree :: DecisionTree i o -> i -> o
runDecisionTree (Comparison f v ifLess ifGreater) i
| f i <= v = runDecisionTree ifLess i
| otherwise = runDecisionTree ifGreater i
runDecisionTree (Leaf o) = o
-- DecisionTree is an encoding of a function, and you can write
-- Functor, Applicative, and Monad instances!
然后
classifier :: DecisionTree Stats Classification
classifier =
Comparison curr_linkDensity 0.333333
(Comparison prev_linkDensity 0.555556
(Comparison curr_numWords 16
(Comparison next_numWords 15
(Comparison prev_numWords 4
(Leaf Boilerplate)
(Leaf Content))
(Leaf Content)
...
答案 1 :(得分:6)
由于此决策树中只有三条路径导致BOILERPLATE状态,因此我只是迭代并简化它们:
isBoilerplate =
prev_linkDensity <= 0.555556 && curr_numWords <= 16 && prev_numWords <= 4
|| prev_linkDensity > 0.555556 && curr_numWords <= 40 && next_numWords <= 17
|| curr_linkDensity > 0.333333