你会如何在Haskell中表达这一点?

时间:2015-07-14 18:19:35

标签: algorithm haskell if-statement

你会使用if / else在Haskell中编写这个算法吗?没有它们有没有办法表达它?很难从具有意义的中间提取功能。这只是机器学习系统的输出。

我正在实施用于将html内容片段分类为内容或Boilerplate描述here的算法。这具有已经硬编码的权重。

curr_linkDensity <= 0.333333
| prev_linkDensity <= 0.555556
| | curr_numWords <= 16
| | | next_numWords <= 15
| | | | prev_numWords <= 4: BOILERPLATE
| | | | prev_numWords > 4: CONTENT
| | | next_numWords > 15: CONTENT
| | curr_numWords > 16: CONTENT
| prev_linkDensity > 0.555556
| | curr_numWords <= 40
| | | next_numWords <= 17: BOILERPLATE
| | | next_numWords > 17: CONTENT
| | curr_numWords > 40: CONTENT
curr_linkDensity > 0.333333: BOILERPLATE

2 个答案:

答案 0 :(得分:11)

不是手动简化逻辑(假设您可能会自动生成此代码),我认为使用MultiWayIf非常简洁直接。

{-# LANGUAGE MultiWayIf #-}

data Stats = Stats {
    curr_linkDensity :: Double,
    prev_linkDensity :: Double,
    ...
}

data Classification = Content | Boilerplate

classify :: Stats -> Classification
classify s = if
    | curr_linkDensity s <= 0.333333 -> if
      | prev_linkDensity s <= 0.555556 -> if
        | curr_numWords s <= 16 -> if
          | next_numWords s <= 15 -> if
            | prev_numWords s <= 4 -> Boilerplate
            | prev_numWords s > 4 -> Content
          | next_numWords s > 16 -> Content
      ...

等等。

但是,由于这是如此结构化 - 只是if / else的树与比较,也考虑创建决策树数据结构并为其编写解释器。这将允许您进行转换,操作,检查。也许它会给你买点东西;为您的规范定义微型语言可能会令人惊讶地受益。

data DecisionTree i o 
    = Comparison (i -> Double) Double (DecisionTree i o) (DecisionTree i o)
    | Leaf o

runDecisionTree :: DecisionTree i o -> i -> o
runDecisionTree (Comparison f v ifLess ifGreater) i
    | f i <= v  = runDecisionTree ifLess i
    | otherwise = runDecisionTree ifGreater i
runDecisionTree (Leaf o) = o

-- DecisionTree is an encoding of a function, and you can write
-- Functor, Applicative, and Monad instances!

然后

 classifier :: DecisionTree Stats Classification
 classifier =
     Comparison curr_linkDensity 0.333333
       (Comparison prev_linkDensity 0.555556
         (Comparison curr_numWords 16
           (Comparison next_numWords 15
             (Comparison prev_numWords 4
               (Leaf Boilerplate)
               (Leaf Content))
             (Leaf Content)
           ...

答案 1 :(得分:6)

由于此决策树中只​​有三条路径导致BOILERPLATE状态,因此我只是迭代并简化它们:

isBoilerplate =
  prev_linkDensity   <= 0.555556 && curr_numWords <= 16 && prev_numWords <= 4
  || prev_linkDensity > 0.555556 && curr_numWords <= 40 && next_numWords <= 17
  || curr_linkDensity > 0.333333