更新

Question

我一直在使用Repa库处理路径跟踪器。我最近通过使用monadic computeP重构了它以结合并行性。但是，我发现性能提升可以忽略不计。此外，监视htop，似乎程序仍然只使用一个CPU。为了深入研究这个问题，我打开了ghci并运行了以下内容：

~
❯ stack ghci --package repa
Configuring GHCi with the following packages: 
GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /tmp/ghci12667/ghci-script
Prelude> import Data.Array.Repa
Prelude Data.Array.Repa> import System.Random
Prelude Data.Array.Repa System.Random> randomList = randoms (mkStdGen 0)
Prelude Data.Array.Repa System.Random> shape = (Z :. 1000000)
Prelude Data.Array.Repa System.Random> array = fromFunction shape $ \(Z :. i) -> randomList !! i
Prelude Data.Array.Repa System.Random> sumP array

没有骰子。 repa似乎仍然只使用一个CPU核心htop：

此外，执行团队在sumP和sumS之间几乎没有变化，略微偏向sumS：

Prelude Data.Array.Repa System.Random> array = fromListUnboxed (Z :. 1000000) $ take 1000000 $ randoms (mkStdGen 0)
(0.01 secs, 0 bytes)
Prelude Data.Array.Repa System.Random> sumP array
AUnboxed Z [500140.92257232184]
(0.99 secs, 1,916,158,952 bytes)
Prelude Data.Array.Repa System.Random> sumS array
AUnboxed Z [500140.92257232184]
(0.93 secs, 2,348,156,248 bytes)

我错过了什么？如果重要，我使用的是Arch Linux：

~
❯ uname -a
Linux roskolnikov 4.11.9-1-ARCH #1 SMP PREEMPT Wed Jul 5 18:23:08 CEST 2017 x86_64 GNU/Linux

更新

有些评论表明，我应该使用-threaded文档中指出的ghci repa选项。我错误地认为ghci使用了-threaded（错误？）的印象。无论如何，我的程序已经使用了这些标志 - 这是.cabal文件的片段：

executable write
  hs-source-dirs:      app
  main-is:             Write.hs
  ghc-options:         -Odph 
                       -rtsopts 
                       -threaded 
                       -fno-liberate-case 
                       -funfolding-use-threshold1000 
                       -funfolding-keeness-factor1000 
                       -fllvm 
                       -optlo-O3
  build-depends:       base 
                     , pathtracer
                     , repa
                     , JuicyPixels
  default-language:    Haskell2010

此外，我使用（我认为）正确的ghci选项重新使用ghci中的命令：

~
❯ stack ghci\
 --package repa\
 --ghc-options -Odph\
 --ghc-options -rtsopts\
 --ghc-options -with-rtsopts=-N\
 --ghc-options -threaded\
 --ghc-options -fno-liberate-case\
 --ghc-options -funfolding-use-threshold1000\
 --ghc-options -funfolding-keeness-factor1000\
 --ghc-options -fllvm\
 --ghc-options -optlo-O3

Configuring GHCi with the following packages: 

when making flags consistent: warning:
    -O conflicts with --interactive; -O ignored.
GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /tmp/ghci31252/ghci-script
Prelude> import Data.Array.Repa
Prelude Data.Array.Repa> import System.Random
Prelude Data.Array.Repa System.Random> randomList = randoms (mkStdGen 0)
Prelude Data.Array.Repa System.Random> shape = (Z :. 1000000)
Prelude Data.Array.Repa System.Random> array = fromFunction shape $ \(Z :. i) -> randomList !! i
Prelude Data.Array.Repa System.Random> sumP array

仍然没有骰子：

我非常感谢有关此事的进一步协助。

Answer 1

无论出于何种原因，ghci似乎忽略了某些输入选项，因此像sumP这样的monadic计算只使用一个CPU核心。但是，这个实验的目的是为我正在研究的个人项目使用多个核心，并且我在这个目标上取得了成功。我认为关键是在-with-rtsopts=-N下的.cabal文件中添加ghc-options。最终的ghc-options如下：

executable write
  hs-source-dirs:      app
  main-is:             Write.hs
  ghc-options:         -Odph 
                       -rtsopts 
                       -with-rtsopts=-N
                       -threaded 
                       -fno-liberate-case 
                       -funfolding-use-threshold1000 
                       -funfolding-keeness-factor1000 
                       -fllvm 
                       -optlo-O3

为什么（Haskell）Repa仅使用一个CPU？

更新

1 个答案: