Question

我刚开始学习并发编程在这种情况下，线程案例使用更多内存（当然）。

但是，线程案例分配较少的堆然后是非线程程序这不是特例吗？对于使用-N1选项编译的程序，使用-threaded选项效果不佳？

我无法理解为什么线程案例分配较少的堆。在我的猜测中，切换线程会产生这种情况。

你能解释一下这种情况吗？

我认为这个程序并不特别，并且输入很少（时间表）。

import Control.Concurrent
import Control.Monad
import System.IO

{-
  This code hangs when "Start scheduler" when compile with option "-O*" without `hFlush stdout`.
    This problem is described as another question
    (http://stackoverflow.com/questions/33311980).
  In this code, the problem isn't fixed to weight processing load.
  Please do not compile this code with optimize option.
-}

main = do
  count <- newMVar 0
  clock <- newMVar 0
  finished <- newMVar 0
  out <- newMVar ()
  mapM_ (\thID -> forkIO $ aThread thID count out clock finished) [0..(length schedules - 1)]
  wmPutStrLn out "Start Scheduler"
  scheduler out count clock finished (length schedules)
  where
    scheduler out c clk fin thNum = loop
      where loop = do
              finished <- readMVar fin
              if finished == thNum
                then do
                  return ()
                else do
                  now <- readMVar c
                  when (now == thNum - finished) $ do
                    _ <- swapMVar c 0 -- I don't like this.
                    modifyMVar_ clk (return .(+1))
                    nClk <- readMVar clk
                    wmPutStrLn out $ "Clock up: " ++ show nClk
                  loop

-- A thread, I think you don't care about this. Just a processing load.
aThread thID c out clk fin = do
  wmPutStrLn out "A thread is initialized"
  loop (schedules !! thID) 0
  where
    loop aSch pClk =
      if null aSch
        then do
          modifyMVar_ fin (return . (+1))
          return ()
        else do
          now <- readMVar clk
          if pClk == now
            then do
              modifyMVar_ c (return . (+1))
              let (t,aMsg) = head $ aSch
              if now == t
                then do
                  wmPutStrLn out aMsg
                  loop (tail aSch) (pClk+1)
                else do
                  loop aSch (pClk+1)
            else loop aSch pClk

schedules :: [[(Int,String)]]
schedules =
    [ [(1,"A1"),(2,"A2")]
    , [(0,"B0"),(3,"B3")]
    , [(1,"C1"),(2,"C2"),(3,"C3")]
    , [(0,"D0"),(2,"D2")]
    , [(0,"E0"),(1,"E1"),(2,"E2")]
    , [(5,"F5")]
    , []
    , [(4,"H4"),(6,"H6"),(8,"H8")]]

wmPutStrLn mv str = withMVar mv $ \_ -> putStrLn str

比较这些输出：

$ ./Scheduler04.toAsk  +RTS -s -N1 > /dev/null
   3,446,295,648 bytes allocated in the heap
       4,112,880 bytes copied during GC
          56,776 bytes maximum residency (2 sample(s))
          21,048 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6636 colls,     0 par    0.020s   0.024s     0.0000s    0.0007s
  Gen  1         2 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    1.246s  (  1.266s elapsed)
  GC      time    0.020s  (  0.024s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    1.267s  (  1.291s elapsed)

  Alloc rate    2,766,218,149 bytes per MUT second

  Productivity  98.4% of total user, 96.5% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0


$ ./Scheduler04.toAsk  +RTS -s -N2 > /dev/null
   1,027,098,440 bytes allocated in the heap
       1,070,584 bytes copied during GC
          70,000 bytes maximum residency (2 sample(s))
          34,568 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1048 colls,  1048 par    0.060s   0.005s     0.0000s    0.0001s
  Gen  1         2 colls,     1 par    0.000s   0.000s     0.0001s    0.0001s

  Parallel GC work balance: 22.49% (serial 0%, perfect 100%)

  TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.494s  (  0.285s elapsed)
  GC      time    0.060s  (  0.005s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.555s  (  0.291s elapsed)

  Alloc rate    2,079,458,137 bytes per MUT second

  Productivity  89.2% of total user, 170.2% of total elapsed

gc_alloc_block_sync: 20
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0


$ ./Scheduler04.toAsk  +RTS -s -N4 > /dev/null
     277,518,568 bytes allocated in the heap
         597,480 bytes copied during GC
          96,400 bytes maximum residency (2 sample(s))
          47,648 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       297 colls,   297 par    0.247s   0.002s     0.0000s    0.0001s
  Gen  1         2 colls,     1 par    0.000s   0.000s     0.0001s    0.0002s

  Parallel GC work balance: 34.57% (serial 0%, perfect 100%)

  TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    0.178s  (  0.109s elapsed)
  GC      time    0.247s  (  0.002s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.427s  (  0.112s elapsed)

  Alloc rate    1,561,830,669 bytes per MUT second

  Productivity  41.9% of total user, 159.5% of total elapsed

gc_alloc_block_sync: 106
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 1

为什么线程程序分配较少的堆（在这种情况下）？

0 个答案: