如何优化列表的过滤

时间:2014-10-31 18:12:55

标签: haskell

要学习Haskell,我一直在解决编程挑战。在试图解决Hackerrank中关于过滤数千个元素列表元素的问题时,我一直无法通过时间测试。

问题陈述:从元素列表中过滤那些出现超过 k 次的元素,并按照它们的出现顺序打印出来。

我到目前为止最好的是这段代码:

import qualified Data.ByteString.Char8 as BSC
import Data.Maybe (fromJust)
import Data.List (intercalate, elemIndices, foldl')
import qualified Data.Set as S

-- Improved version of nub
nub' :: (Ord a) => [a] -> [a]
nub' = go S.empty
  where go _ [] = []
        go s (x:xs) | S.member x s = go s xs
                    | otherwise    = x : go (S.insert x s) xs

-- Extract Int from ByteString     
getIntFromBS :: BSC.ByteString -> Int
getIntFromBS = fst . fromJust . BSC.readInt

{- 
    Parse read file:

    a k1
    n1 n2 n3 n4 ... 
    c k2
    m1 m2 m3 m4 ...

    into appropriate format:

    [(k1, [n1,n2,n3,n4]), (k2, [m1,m2,m3,m4])]
-}
createGroups :: [BSC.ByteString] -> [(Int, [Int])]
createGroups [] = []
createGroups (p:v:xs) =
    let val = getIntFromBS $ last $ BSC.split ' ' p
        grp = foldr (\x acc -> getIntFromBS x : acc) [] $ BSC.split ' ' v
    in (val, grp) : createGroups xs

solve :: (Int, [Int]) -> String
solve (k, v) = intercalate " " $ if null res then ["-1"] else res
    where
        go n acc =
            if length (elemIndices n v) > k
                then show n : acc
                else          acc
        res = foldr go [] (nub' v)

fullSolve :: [BSC.ByteString] -> [String]
fullSolve xs = foldl' (\acc tupla -> acc ++ [solve tupla]) [] $ createGroups xs

main = do
    BSC.getContents >>= mapM_ putStrLn . fullSolve . drop 1 . BSC.lines

我想知道在哪里可以改进此代码。我尝试了许多使用地图,向量,解析而不是将文件中的读取字符串解析为Int的变体,但显示的代码是我所拥有的最佳代码。

3 个答案:

答案 0 :(得分:1)

如果我必须解决这个问题,我可能会先尝试使用Data.Map.Strict(对于O(log n )修改)隐藏在Control.Monad.State.Strict monad的操作中变压器。

import Data.Map.Strict
import Control.Monad.State.Strict

type SIO x = StateT (Map String Int) IO x

incCount :: String -> Int -> Int -> Int
incCount _ _ old_val = 1 + old_val

incAndGetCount :: String -> SIO Int
incAndGetCount s = fmap unMaybe $ state $ insertLookupWithKey incCount s 1
    where unMaybe (Just x) = x + 1
          unMaybe Nothing = 1

processKey :: String -> SIO ()
processKey s = do
    ct <- incAndGetCount s
    if ct == 5 then lift (putStrLn s) else return ()

process :: [String] -> IO ()
process list = evalStateT (mapM_ processKey list) empty

虽然我觉得这段代码更优雅,但我无法知道在没有真正看到测试数据的情况下它是否更快。在任何情况下,这相当于一个命令式循环,它将字符串放入字典中,检索到目前为止看到它的次数,然后如果该数字为5则将该字符串打印到标准输出。

当然,您需要将其与适当的main方法结合使用。

答案 1 :(得分:1)

即使在开始时我尝试使用Data.Map,但它缺少关于折叠与地图使用的评论中指出的优化,并且还缺少所需的输出顺序(按照外观顺序)。最终的解决方案如下:

{-# OPTIONS_GHC -O2 #-}
import Control.Monad (liftM, replicateM_)
import Data.Maybe (fromJust)
import Data.List (foldl', sort, unwords)
import qualified Data.Map.Strict as M
import qualified Data.ByteString.Char8 as BSC

getIntFromBS :: BSC.ByteString -> Int
getIntFromBS = fst.fromJust.BSC.readInt

solve :: Int -> [Int] -> String
solve k = unwords . map snd . sort . map finalPair . filter hasHighFreq . M.toList . foldl' insMap M.empty . zip [0..]
    where
        f _ _ (i, old_value) = (i, old_value + 1)
        insMap m' (i, x) = M.insertWithKey f x (i,1) m'
        hasHighFreq (_, (_, freq)) = freq >= k
        finalPair (val, (i, freq)) = (i, show val)

main = do
    n <- liftM getIntFromBS BSC.getLine
    replicateM_ n $ do
        [_, k] <- liftM (map getIntFromBS . BSC.words) BSC.getLine
        vals   <- liftM (map getIntFromBS . BSC.words) BSC.getLine
        let res = solve k vals
        putStrLn (if null res then "-1" else res)

答案 2 :(得分:0)

编辑:oops,这是错误的。产生的订单是“发现的”,但应该是“首先在原始列表中看到”。这可以通过索引和排序来调整......不幸的是,这意味着它不再具有生产力。

使用

,您可以提高此功能的效率,避免对索引和排序的任何需求
import qualified Data.IntMap.Strict as M
import Data.List 

-- mapAccumL :: (acc -> x -> (acc, y)) -> acc -> [x] -> (acc, [y])

solve :: Int -> [Int] -> [Int]
solve k ns = concat . snd $ mapAccumL g M.empty ns
  where
    g m n = case u of         -- for each n in ns, with updating m,
              Nothing -> (M.insert n 1 m, [n | k==1])
              Just c -> (m2, [n | c==k-1])
     where
        (u,m2) = M.updateLookupWithKey (\n c-> Just (c+1)) n m

只要在输入列表中遇到 k 元素实例,我们就可以生成它。最终频率图被忽略。

最好让您的功能更专注。制作了Int的列表后,您可以将其传递给unwords . map show或您拥有的内容。

Data.IntMap.Strict“......基准测试显示,与通用大小平衡的地图实现”相比,[IntMap]在插入和删除方面更快(更快)。