Question

修复和加速API相似性

Haskell repa库用于在CPU上自动并行计算数组。加速库是GPU上的自动数据并行。 API非常相似，具有相同的N维数组表示。甚至可以在fromRepa中使用toRepa和Data.Array.Accelerate.IO在加速和修复阵列之间切换：

fromRepa :: (Shapes sh sh', Elt e) => Array A sh e -> Array sh' e
toRepa   :: Shapes sh sh'          => Array sh' e  -> Array A sh e

有多个加速后端，包括LLVM，CUDA和FPGA（参见http://www.cse.unsw.edu.au/~keller/Papers/acc-cuda.pdf的图2）。虽然图书馆似乎没有得到维护，但我发现repa backend加速了。鉴于修复和加速编程模型是相似的，我希望有一种优雅的方式在它们之间切换，即一次写入的函数可以用repa的R.computeP或者加速后端之一执行。例如使用CUDA run功能。

两个非常相似的功能：修复和加速南瓜

采用简单的图像处理阈值处理功能。如果灰度像素值小于50，则将其设置为0，否则保留其值。这就是它对南瓜的作用：

以下代码介绍了修复和加速实施：

module Main where

import qualified Data.Array.Repa as R
import qualified Data.Array.Repa.IO.BMP as R
import qualified Data.Array.Accelerate as A
import qualified Data.Array.Accelerate.IO as A
import qualified Data.Array.Accelerate.Interpreter as A

import Data.Word

-- Apply threshold over image using accelerate (interpreter)
thresholdAccelerate :: IO ()
thresholdAccelerate = do
  img <- either (error . show) id `fmap` A.readImageFromBMP "pumpkin-in.bmp"
  let newImg = A.run $ A.map evalPixel (A.use img)
  A.writeImageToBMP "pumpkin-out.bmp" newImg
    where
      -- *** Exception: Prelude.Ord.compare applied to EDSL types
      evalPixel :: A.Exp A.Word32 -> A.Exp A.Word32
      evalPixel p = if p > 50 then p else 0

-- Apply threshold over image using repa
thresholdRepa :: IO ()
thresholdRepa = do
  let arr :: IO (R.Array R.U R.DIM2 (Word8,Word8,Word8))
      arr = either (error . show) id `fmap` R.readImageFromBMP "pumpkin-in.bmp" 
  img <- arr
  newImg <- R.computeP (R.map applyAtPoint img)
  R.writeImageToBMP "pumpkin-out.bmp" newImg
  where
    applyAtPoint :: (Word8,Word8,Word8) -> (Word8,Word8,Word8)
    applyAtPoint (r,g,b) =
        let [r',g',b'] = map applyThresholdOnPixel [r,g,b]
        in (r',g',b')
    applyThresholdOnPixel x = if x > 50 then x else 0

data BackendChoice = Repa | Accelerate

main :: IO ()
main = do
  let userChoice = Repa -- pretend this command line flag
  case userChoice of
    Repa       -> thresholdRepa
    Accelerate -> thresholdAccelerate

问题：我只能写一次吗？

thresholdAccelerate和thresholdRepa的实现非常相似。是否有一种优雅的方法来编写一次阵列处理功能，然后以编程方式选择多路CPU（修复）或GPU（加速）？我可以考虑根据我是否需要CPU或GPU来选择我的导入，即导入Data.Array.Accelerate.CUDA或Data.Array.Repa以执行Acc a类型的操作：

run :: Arrays a => Acc a -> a

或者，使用类型类，例如大致相似的东西：

main :: IO ()
main = do
  let userChoice = Repa -- pretend this is a command line flag
  action <- case userChoice of
    Repa       -> applyThreshold :: RepaBackend ()
    Accelerate -> applyThreshold :: CudaBackend ()
  action

或者是这样的情况，对于我希望为CPU和GPU表达的每个并行数组函数，我必须实现它两次 - 一次使用repa库，再一次使用加速库？

Answer 1

简短的回答是，目前，您不幸需要编写两个版本。

但是，我们正致力于对Accelerate的CPU支持，这将消除对代码的Repa版本的需求。特别是，Accelerate最近获得了一个新的基于LLVM的后端，同时针对GPU和CPU：https://github.com/AccelerateHS/accelerate-llvm

这个新的后端仍然是不完整的，错误的和实验性的，但我们正计划将其作为当前CUDA后端的可行替代方案。

Answer 2

我在设计yarr时考虑了这一年和几个月前。当时类型族推理存在严重问题或类似的事情（我不记得确切），这阻碍了实现vector，repa，yarr，{accelerate，{{1}}，{{1}}的统一包装。 {1}}等，既有效又允许不写太多显式类型签名，或原则上实现它（我不记得）。

那是GHC 7.6。我不知道GHC 7.8在这个领域是否有重大改进。从理论上讲，我没有看到任何问题，因此我们可以在某一天，无论是短期还是长期，在GHC准备好的时候都能期待这样的事情。

写一个并行数组Haskell表达式，运行在CPU＆amp;具有修复和加速的GPU

修复和加速API相似性

两个非常相似的功能：修复和加速南瓜

问题：我只能写一次吗？

2 个答案: