在Haskell的元组列表中累积值计数

时间:2019-01-05 10:30:31

标签: list haskell count iteration

我正在尝试使用指示值类型(每年和每季度)的模式字符串来解析列表。我需要在结果输出中累积四分位数。到目前为止,我想到了这个:

row = [100, 10, 40, 25, 25]
fmt = "aqqqq"
expected = [('a',1,100),('q',1,10),('q',2,40),('q',3,25),('q',4,25)]

count :: Char -> String -> Int
count letter str = length $ filter (== letter) str

split :: String -> [a] -> [(Char, Int, a)]
split fmt row = [(freq, count freq (fmt' i), x)   
               | (freq, x, i) <- zip3 fmt row [0..]]
               where fmt' i = take (i+1) fmt

-- split "aqqqq" [100, 10, 40, 25, 25]
-- [('a',1,100),('q',1,10),('q',2,40),('q',3,25),('q',4,25)]

我想应该有比该代码更具可读性和性能的东西,甚至是一个很棒的衬里。

我还尝试将"aqqqq"扩展到元组[('a',1),('q',1),('q',2),('q',3),('q',4)]的列表中,然后添加值;也许这是更好的方法,因为我需要为几行指定一次格式。

3 个答案:

答案 0 :(得分:6)

如果您已经具有将expand扩展为元组列表的功能"aqqqq",则可以使用zipWith完成其余操作:

Prelude> zipWith (\(p, ix) x -> (p, ix, x)) (expand fmt) row
[('a',1,100),('q',1,10),('q',2,40),('q',3,25),('q',4,25)]

expand函数产生Num t => (Char, t)类型的元组。我在该元组p(对于期间)和ix(对于 index )中调用了值。用row压缩该元组列表也会产生值,即我在lambda表达式中简称为x

答案 1 :(得分:2)

这里的主要问题是如何将字符串(例如"aqqqq")转换为出现在字符串中的字符频率列表。即我们想要:

"aqqqq" => [1, 1, 2, 3, 4]

构造频率列表后,我们可以使用zip3将期望的元组列表生成为:

[('a',1,100),('q',1,10),('q',2,40),('q',3,25),('q',4,25)]

很显然,我们不能使用map来产生所需的频率列表,因为需要累加该值。为了解决该问题,我建议使用Data.Map,以将计算复杂度从O(n)提升到O(log n)

使用insertWith来计算频率很简单:

countFreq  c m = insertWith (+) c 1 m

并使用lookup取回累计值:

accumValue c m = fromMaybe 0 (Map.lookup c m) + 1

现在,直接将所需列表构建为:

mkAccumList (c:cs) m = accumValue c m : mkAccumList cs (countFreq c m)

放在一起:

import Data.Map as Map (empty, lookup, insertWith)
import Data.Maybe (fromMaybe)

countFreq  c m = insertWith (+) c 1 m
accumValue c m = fromMaybe 0 (Map.lookup c m) + 1

split::String -> [a] -> [(Char, Int, a)]
split fmt row = zip3 fmt (mkAccumList fmt Map.empty) row
    where mkAccumList (c:cs) m = accumValue c m : mkAccumList cs (countFreq c m)
          mkAccumList [] _ = []

要使用无限列表:

take 8 $ split (cycle "aqqqq") (cycle [100, 10, 40, 25, 25])

给予

[('a',1,100),('q',1,10),('q',2,40),('q',3,25),('q',4,25),('a',2,100),('q',5,10),
('q',6,40)]    

答案 2 :(得分:1)

根据@Mark Seemann的建议,以下是带有解决方案的完整列表。我将lambda更改为命名函数,以提高可读性,并引入了行格式的类型。

count :: Char -> String -> Int
count letter str = length $ filter (== letter) str

type RowFormat = [Char]
expand :: RowFormat -> [(Char, Int)]
expand pat = [(c, count c (take (i+1) pat)) | (c, i) <- zip pat [0..]]

split' :: RowFormat -> [a] -> [(Char, Int, a)]
split' fmt values = zipWith merge (expand fmt) values
      where merge (freq, period) value = (freq, period, value) 

结果符合预期:

*Main> split' "aqqqq" [100, 10, 40, 25, 25]
[('a',1,100),('q',1,10),('q',2,40),('q',3,25),('q',4,25)]

事后的想法-每次解析行时,我仍然会扩展格式字符串,甚至可能会出现parse = split' "aqqqq"都会延迟计算的情况。 这是我尝试制作专用阅读器功能的尝试:

makeSplitter fmt = \values -> zipWith merge pos values
      where 
        merge (freq, period) value = (freq, period, value)
        pos = expand fmt 
splitRow = makeSplitter "aqqqq" 
a = splitRow [100, 10, 40, 25, 25]

a是预期结果,与上面相同

[('a',1,100),('q',1,10),('q',2,40),('q',3,25),('q',4,25)]