Haskell - 使用来自Real World Haskell的mapreduce框架(Control.Parallel.Strategies)的并行字数

时间:2014-11-24 23:06:58

标签: haskell concurrency parallel-processing mapreduce

我是一名学生在Haskell上完成并行和并发的任务。作为赋值的一部分,我们得到了这段代码(最初来自Real World Haskell的第24章),我们被要求接受它并编写一个并行字数统计程序:

-- file: ch24/MapReduce.hs
mapReduce
    :: Strategy b    -- evaluation strategy for mapping
    -> (a -> b)      -- map function
    -> Strategy c    -- evaluation strategy for reduction
    -> ([b] -> c)    -- reduce function
    -> [a]           -- list to map over
    -> c

-- file: ch24/MapReduce.hs
mapReduce mapStrat mapFunc reduceStrat reduceFunc input =
    mapResult `pseq` reduceResult
  where mapResult    = parMap mapStrat mapFunc input
        reduceResult = reduceFunc mapResult `using` reduceStrat

顺序版:

我写了一个程序的顺序版本,它起作用了:

import System.Environment  
import System.IO  
import System.Directory 
import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&)) 
import Data.Map as Map

simpleMapReduce
    :: (a -> b)      -- map function
    -> ([b] -> c)    -- reduce function
    -> [a]           -- list to map over
    -> c             -- result
simpleMapReduce mapFunc reduceFunc  = reduceFunc . Prelude.map mapFunc  

stringToWordCountMap :: String -> Map.Map String Int
stringToWordCountMap  = Map.fromList . Prelude.map (head &&& length) . group . sort . words . Prelude.map toLower 

combineWordCountMaps :: Map.Map String Int -> Map.Map String Int -> Map.Map String Int
combineWordCountMaps map1 map2 = Map.unionWith (+) map1 map2

reduceWordCountMaps :: [Map.Map String Int] -> Map.Map String Int
reduceWordCountMaps  (x:[]) = x
reduceWordCountMaps (x:xs) = combineWordCountMaps x (reduceWordCountMaps xs)

main = do (fileName:_) <- getArgs  
          fileExists <- doesFileExist fileName  
          if fileExists  
              then do contents <- readFile fileName  
                  let fileInLines = lines contents
              result = simpleMapReduce stringToWordCountMap reduceWordCountMaps fileInLines
                      putStrLn $ "The file has " ++ show (length (lines contents)) ++ " lines!"
              putStrLn $ "result = " ++ show result ++ "."

              else do putStrLn "The file doesn't exist!"  

目前我正试图弄清楚如何使用给定的框架让它在并行工作。

到目前为止我做了什么:

这是我尝试编写上面的并行版本,我甚至无法编译:

(我正在使用ghc -threaded -rts opts -eventlog part02.hs)

import System.Environment  
import System.IO
import System.Directory 
import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&)) 
import Data.Map as Map
import Control.Parallel
import Control.Parallel.Strategies

mapReduce
    :: Strategy b    -- evaluation strategy for mapping
    -> (a -> b)      -- map function
    -> Strategy c    -- evaluation strategy for reduction
    -> ([b] -> c)    -- reduce function
    -> [a]           -- list to map over
    -> c

-- file: ch24/MapReduce.hs
mapReduce mapStrat mapFunc reduceStrat reduceFunc input =
    mapResult `pseq` reduceResult
  where mapResult    = Control.Parallel.Strategies.parMap mapStrat mapFunc input
        reduceResult = reduceFunc mapResult `using` reduceStrat

stringToWordCountMap :: String -> Map.Map String Int
stringToWordCountMap  = Map.fromList . Prelude.map (head &&& length) . group . sort . words . Prelude.map toLower 

combineWordCountMaps :: Map.Map String Int -> Map.Map String Int -> Map.Map String Int
combineWordCountMaps map1 map2 = Map.unionWith (+) map1 map2

reduceWordCountMaps :: [ Map.Map String Int] -> Map.Map String Int
reduceWordCountMaps  (x:[]) = x
reduceWordCountMaps  (x:xs) = combineWordCountMaps x (reduceWordCountMaps xs)

main = do (fileName:_) <- getArgs  
          fileExists <- doesFileExist fileName  
          if fileExists  
              then do contents <- readFile fileName  
                  let fileInLines = lines contents
              result = mapReduce Control.Parallel.Strategies.parMap stringToWordCountMap Control.Parallel.Strategies.parList reduceWordCountMaps fileInLines

                      putStrLn $ "The file has " ++ show (length (lines contents)) ++ " lines!"
              putStrLn $ "result = " ++ show result ++ "."
              else do putStrLn "The file doesn't exist!"  

编辑 - 编译错误消息:

part02.hs:43:46:
    Couldn't match type `(a0 -> b0) -> [a0] -> [b0]'
                  with `Eval (Strategy b0)'
    Expected type: Strategy (Strategy b0)
      Actual type: Strategy b0 -> (a0 -> b0) -> [a0] -> [b0]
    In the first argument of `mapReduce', namely `parMap'
    In the expression:
      mapReduce
        parMap stringToWordCountMap parList reduceWordCountMaps fileInLines
    In an equation for `result':
        result
          = mapReduce
              parMap stringToWordCountMap parList reduceWordCountMaps fileInLine
s

part02.hs:43:81:
    Couldn't match type `Map String Int' with `b0 -> Eval b0'
    Expected type: String -> Strategy b0
      Actual type: String -> Map String Int
    In the second argument of `mapReduce', namely
      `stringToWordCountMap'
    In the expression:
      mapReduce
        parMap stringToWordCountMap parList reduceWordCountMaps fileInLines
    In an equation for `result':
        result
          = mapReduce
              parMap stringToWordCountMap parList reduceWordCountMaps fileInLine
s

part02.hs:43:102:
    Couldn't match type `[a1] -> Eval [a1]' with `Eval (Strategy a1)'
    Expected type: Strategy (Strategy a1)
      Actual type: Strategy a1 -> Strategy [a1]
    In the third argument of `mapReduce', namely `parList'
    In the expression:
      mapReduce
        parMap stringToWordCountMap parList reduceWordCountMaps fileInLines
    In an equation for `result':
        result
          = mapReduce
              parMap stringToWordCountMap parList reduceWordCountMaps fileInLine
s

part02.hs:43:138:
    Couldn't match type `Map String Int' with `b0 -> Eval b0'
    Expected type: [Strategy b0] -> Strategy a1
      Actual type: [Map String Int] -> Map String Int
    In the fourth argument of `mapReduce', namely `reduceWordCountMaps'
    In the expression:
      mapReduce
        parMap stringToWordCountMap parList reduceWordCountMaps fileInLines
    In an equation for `result':
        result
          = mapReduce
              parMap stringToWordCountMap parList reduceWordCountMaps fileInLine
s

我想知道是否有可能让任何人看看这个并帮助我让它工作?如果我遗漏了一些显而易见的东西,我对haskell仍然没有很多经验。我发现策略有点令人困惑,所以任何链接/资源也会受到赞赏。非常感谢。

最终修改:

来自user5402的回答:

  

当我复制粘贴代码时,我也遇到了一些缩进错误。此处提供仅发出警告的版本。

如果它消失,只需粘贴下面的内容:

import System.Environment  
import System.IO
import System.Directory 
import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&)) 
import Data.Map as Map
import Control.Parallel
import Control.Parallel.Strategies

mapReduce
    :: Strategy b    -- evaluation strategy for mapping
    -> (a -> b)      -- map function
    -> Strategy c    -- evaluation strategy for reduction
    -> ([b] -> c)    -- reduce function
    -> [a]           -- list to map over
    -> c

-- file: ch24/MapReduce.hs
mapReduce mapStrat mapFunc reduceStrat reduceFunc input =
    mapResult `pseq` reduceResult
  where mapResult    = parMap mapStrat mapFunc input
        reduceResult = reduceFunc mapResult `using` reduceStrat

stringToWordCountMap :: String -> Map.Map String Int
stringToWordCountMap  = Map.fromList . Prelude.map (head &&& length) . group . sort . words . Prelude.map toLower 

combineWordCountMaps :: Map.Map String Int -> Map.Map String Int -> Map.Map String Int
combineWordCountMaps map1 map2 = Map.unionWith (+) map1 map2

reduceWordCountMaps :: [ Map.Map String Int] -> Map.Map String Int
reduceWordCountMaps  (x:[]) = x
reduceWordCountMaps  (x:xs) = combineWordCountMaps x (reduceWordCountMaps xs)

main = do (fileName:_) <- getArgs  
          fileExists <- doesFileExist fileName  
          if fileExists  
              then do contents <- readFile fileName  
                      let fileInLines = lines contents
                          result = mapReduce rpar stringToWordCountMap rpar reduceWordCountMaps fileInLines

                      putStrLn $ "The file has " ++ show (length (lines contents)) ++ " lines!"
                      putStrLn $ "result = " ++ show result ++ "."
              else do putStrLn "The file doesn't exist!"  

1 个答案:

答案 0 :(得分:1)

在这一行:

result = mapReduce Control.Parallel.Strategies.parMap stringToWordCountMap Control.Parallel.Strategies.parList reduceWordCountMaps fileInLines

对于Strategy,您应该提供rparrseq之类的值。有关其他已定义的策略,请参阅this documentation。所以上面的行应该是这样的:

result = mapReduce rpar stringToWordCountMap rpar reduceWordCountMaps fileInLines

当我复制粘贴代码时,我也遇到了一些缩进错误。只提供警告的版本here

注意:无需完全限定名称parMaprpar等。如果存在冲突,请考虑使用qualified imports with abbreviations