我有以下代码:
module Main where
import Data.IORef
import qualified Data.ByteString as S
import Control.Monad
import Control.Concurrent
main :: IO ()
main = do
var <- newIORef False
forkIO $ forever $ do
status <- readIORef var
if status
then putStrLn "main: file was read"
else putStrLn "main: file not yet read"
threadDelay 10000
threadDelay 200000
putStrLn ">>! going to read file"
--threadDelay 200000 --
str <- S.readFile "large2"
putStrLn ">>! finished reading file"
writeIORef var True
threadDelay 200000
我编译代码并运行它:
$ ghc -threaded --make test.hs
$ dd if=/dev/urandom of=large bs=800000 count=1024
$ ./test +RTS -N3
<...>
main: file not yet read
main: file not yet read
main: file not yet read
main: file not yet read
>>! going to read file
>>! finished reading file
main: file was read
main: file was read
main: file was read
main: file was read
<...>
也就是说,程序在读取文件时暂停。我发现这令人困惑,因为如果我将readFile
替换为threadDelay
,则会正确控制对象。
这里发生了什么?是不是GHC将forkIO
'代码映射到不同的系统线程?
(我使用的是Mac OS X 10.8.5,但人们在Ubuntu和Debian上报告了相同的行为)
答案 0 :(得分:8)
我认为大量分配正在触发垃圾收集,但是在所有线程都准备就绪之前,收集本身无法启动。
当遇到这样的问题时,您可以使用ThreadScope.
查看发生了什么代码中的事件日志如下所示:
问题是我们想让其他线程有机会运行。
因此,我们不使用S.readFile
,而是使用分块读取并累积结果(或延迟字节串)。如:
readChunky filename = withFile filename ReadMode $ \x -> do
go x S.empty
where
go h acc = do
more <- hIsEOF h
case more of
True -> return acc
False -> do
v <- S.hGet h (4096 * 4096)
go h $ S.append acc v
它按预期工作。
见图:
答案 1 :(得分:5)
我也有一个解决方法,但我不认为它保证程序不会阻止(虽然我还没有阻止它,其他人报告它仍然在他们的机器上阻塞)。使用+RTS -N -qg
运行以下内容(如果允许并行GC,它有时会阻止,但并非总是如此):
module Main where
import Data.IORef
import qualified Data.ByteString as S
import Control.Monad
import Control.Concurrent
main :: IO ()
main = do
done <- newEmptyMVar
forkIO $ do
var <- newIORef False
forkIO $ forever $ do
status <- readIORef var
if status
then putStrLn "main: file was read"
else putStrLn "main: file not yet read"
threadDelay 10000
threadDelay 200000
putStrLn ">>! going to read file"
--threadDelay 200000 --
_str <- S.readFile "large"
putStrLn ">>! finished reading file"
writeIORef var True
threadDelay 200000
putMVar done ()
takeMVar done
我还没有关于为什么 GC等待系统调用的理论。我似乎无法使用我自己对sleep
的安全和不安全绑定以及将performGC
添加到状态循环来复制该问题。
答案 2 :(得分:1)
我认为它readFile
与基础ByteString
操作无关。 unsafe
中有几个Data.ByteString.Internal
个FFI来电:
foreign import ccall unsafe "string.h strlen" c_strlen
:: CString -> IO CSize
foreign import ccall unsafe "static stdlib.h &free" c_free_finalizer
:: FunPtr (Ptr Word8 -> IO ())
foreign import ccall unsafe "string.h memchr" c_memchr
:: Ptr Word8 -> CInt -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe "string.h memcmp" c_memcmp
:: Ptr Word8 -> Ptr Word8 -> CSize -> IO CInt
foreign import ccall unsafe "string.h memcpy" c_memcpy
:: Ptr Word8 -> Ptr Word8 -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe "string.h memset" c_memset
:: Ptr Word8 -> CInt -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe "static fpstring.h fps_reverse" c_reverse
:: Ptr Word8 -> Ptr Word8 -> CULong -> IO ()
foreign import ccall unsafe "static fpstring.h fps_intersperse" c_intersperse
:: Ptr Word8 -> Ptr Word8 -> CULong -> Word8 -> IO ()
foreign import ccall unsafe "static fpstring.h fps_maximum" c_maximum
:: Ptr Word8 -> CULong -> IO Word8
foreign import ccall unsafe "static fpstring.h fps_minimum" c_minimum
:: Ptr Word8 -> CULong -> IO Word8
foreign import ccall unsafe "static fpstring.h fps_count" c_count
:: Ptr Word8 -> CULong -> Word8 -> IO CULong
这些不安全的调用比安全调用更快(每次调用的开销都很小),但它们会阻塞Haskell运行时系统(包括线程),直到它们完成。
我不是100%肯定这是你看到延迟的原因,但这是我想到的第一件事。