我有以下Haskell代码,实现了" cat"的简单版本。 unix命令行实用程序。用"时间"测试性能在一个400MB的文件上,它的速度要慢3倍。 (我用来测试它的确切脚本在代码下面。)
我的问题是:
关于问题2和3:我使用了GHC -prof,然后使用+ RTS -p运行,但我发现这里的输出没有信息。
来源(Main.hs)
module Main where
import System.IO
import System.Environment
import Data.ByteString as BS
import Control.Monad
-- Copied from cat source code
bufsize = 1024*128
go handle buf = do
hPut stdout buf
eof <- hIsEOF handle
unless eof $ do
buf <- hGetSome handle bufsize
go handle buf
main = do
file <- fmap Prelude.head getArgs
handle <- openFile file ReadMode
buf <- hGetSome handle bufsize
hSetBuffering stdin $ BlockBuffering (Just bufsize)
hSetBuffering stdout $ BlockBuffering (Just bufsize)
go handle buf
计时脚本(run.sh):
#!/usr/bin/env bash
# Generate 10M lines of silly test data
yes aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | head -n 10000000 > huge
# Compile with optimisation
ghc -O2 Main.hs
# Run haskell
echo "timing Haskell"
time ./Main huge > /dev/null
echo ""
echo ""
# Run cat
echo "timing 'cat'"
time cat huge > /dev/null
我的结果:
timing Haskell
real 0m0.980s
user 0m0.296s
sys 0m0.684s
timing 'cat'
real 0m0.304s
user 0m0.001s
sys 0m0.302s
使用-prof进行编译并使用+ RTS -p运行时的分析报告如下:
Sat Dec 13 21:26 2014 Time and Allocation Profiling Report (Final)
Main +RTS -p -RTS huge
total time = 0.92 secs (922 ticks @ 1000 us, 1 processor)
total alloc = 7,258,596,176 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 100.0 100.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 46 0 100.0 100.0 100.0 100.0
CAF GHC.Conc.Signal 84 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 81 0 0.0 0.0 0.0 0.0
CAF System.Posix.Internals 76 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 70 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 69 0 0.0 0.0 0.0 0.0
答案 0 :(得分:15)
这只是试图解决第二个问题的部分答案:
我使用GHC.IO.Buffer
API尝试了类似的内容:
module Main where
import System.IO
import System.Environment
import GHC.IO.Buffer
import Data.ByteString as BS
import Control.Monad
-- Copied from cat source code
bufsize = 1024*128
go handle bufPtr = do
read <- hGetBuf handle bufPtr bufsize
when (read > 0) $ do
hPutBuf stdout bufPtr read
go handle bufPtr
main = do
file <- fmap Prelude.head getArgs
handle <- openFile file ReadMode
buf <- newByteBuffer bufsize WriteBuffer
withBuffer buf $ go handle
它似乎更接近'猫'的表现,但仍然肯定更慢......
time ./Cat huge > /dev/null
./Cat huge > /dev/null 0.00s user 0.06s system 76% cpu 0.081 total
time cat huge > /dev/null
cat huge > /dev/null 0.00s user 0.05s system 75% cpu 0.063 total
我认为使用缓冲区API,我们可以明显地避免在原始代码中使用hGetSome
时分配所有缓冲区字节串,但我只是猜测这里并且不知道两者究竟发生了什么编译代码......
更新:在我的笔记本电脑上添加原始代码的性能:
time ./Cat2 huge > /dev/null
./Cat2 huge > /dev/null 0.12s user 0.10s system 99% cpu 0.219 total
更新2:添加一些基本的分析结果:
原始代码:
Cat2 +RTS -p -RTS huge
total time = 0.21 secs (211 ticks @ 1000 us, 1 processor)
total alloc = 6,954,068,112 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 100.0 100.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 46 0 100.0 100.0 100.0 100.0
CAF GHC.IO.Handle.FD 86 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 80 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 79 0 0.0 0.0 0.0 0.0
CAF System.Posix.Internals 75 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 72 0 0.0 0.0 0.0 0.0
Buffer-API代码:
Cat +RTS -p -RTS huge
total time = 0.06 secs (61 ticks @ 1000 us, 1 processor)
total alloc = 3,487,712 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 100.0 98.9
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 44 0 100.0 98.9 100.0 100.0
CAF GHC.IO.Handle.FD 85 0 0.0 1.0 0.0 1.0
CAF GHC.Conc.Signal 82 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 80 0 0.0 0.1 0.0 0.1
CAF GHC.IO.FD 79 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 71 0 0.0 0.0 0.0 0.0
特别注意分配成本的巨大差异......
答案 1 :(得分:14)
最初的问题让我觉得这是关于在提供的确切代码中找到性能问题。由于评论&#34;我希望寻求更加惯用的&#34;高水平&#34; Haskell解决方案&#34;与这种假设相矛盾,我将给出合理执行惯用的Haskell解决方案。
我希望任何熟悉Haskell的随机程序员解决这个问题的方法都是使用Lazy bytestrings。这允许程序员简单地指定读取输入和输出的任务,同时让编译器担心使用缓冲和循环结构。
模块主要位置
import System.IO
import System.Environment
import Data.ByteString.Lazy as BS
import Control.Monad
main :: IO ()
main = do
file <- fmap Prelude.head getArgs
handle <- openFile file ReadMode
buf <- BS.hGetContents handle
hPut stdout buf
结果比原始问题中的代码更具可读性和更好的性能:
timing 'cat'
real 0m0.075s
user 0m0.000s
sys 0m0.074s
timing strict bytestring with GHC -O2
real 0m0.254s
user 0m0.126s
sys 0m0.127s
timing strict bytestring with GHC -O2 -fllvm
real 0m0.267s
user 0m0.132s
sys 0m0.134s
timing lazy bytestring with GHC -O2
real 0m0.091s
user 0m0.023s
sys 0m0.067s
timing lazy bytestring with GHC -O2 -fllvm
real 0m0.091s
user 0m0.021s
sys 0m0.069s
也就是说,lazy bytestring解决方案比cat
慢21%。将cat
置于最后的优先缓存行为会导致59ms运行时将Haskell解决方案放慢51%。
编辑:Dons建议使用内存映射IO可以更准确地模拟猫的行为。我不确定该声明有多准确,但mmap几乎总能带来更好的性能,这种情况当然也不例外:
timing memory mapped lazy bytestring with GHC -O2
real 0m0.008s
user 0m0.004s
sys 0m0.003s
由以下人员制作:
module Main where
import System.IO (stdout)
import System.Environment
import System.IO.Posix.MMap.Lazy
import Data.ByteString.Lazy (hPut)
import Control.Monad
main :: IO ()
main = do
file <- fmap Prelude.head getArgs
buf <- unsafeMMapFile file
hPut stdout buf
答案 2 :(得分:5)
备注发布festum :
我不确定现在的问题是人们已经启动了一下。我想看看bytestring-mmap
的内容是什么,所以我制作了一个管道版本来纠正&#39;它的lazy bytestring模块。 https://github.com/michaelt/pipes-bytestring-mmap因此,我使用sibi
测试方法组装了所有这些程序。 https://github.com/michaelt/pipes-bytestring-mmap/tree/master/bench中只有两个模块似乎只是愚蠢的面包和黄油haskell,它们使用花哨的显式缓冲管理。
无论如何,这里有一些结果:当我们向右移动时,文件大小增加10 *。有趣的是看程序在不同文件大小上的差异程度。不使用mmap
的程序只会开始将其字符显示为文件长度的线性&#39;在420M。在那时,之后,它们几乎完全相同,这表明较小尺寸的相当不同的行为不能过于严肃。 mmap
个文件的行为相似(彼此)有一些好奇心(我复制了)所有这些都在os x上。
4200000 42000000 420000000 4200000000
timing 'cat'
real 0m0.006s real 0m0.013s real 0m0.919s real 0m8.154s
user 0m0.002s user 0m0.002s user 0m0.005s user 0m0.028s
sys 0m0.003s sys 0m0.009s sys 0m0.223s sys 0m2.179s
timing lazy bytestring - idiomatic Haskell (following Thomas M. DuBuisson)
real 0m0.009s real 0m0.025s real 0m0.894s real 0m9.146s
user 0m0.002s user 0m0.006s user 0m0.078s user 0m0.787s
sys 0m0.005s sys 0m0.016s sys 0m0.288s sys 0m3.001s
timing fancy buffering following statusfailed
real 0m0.014s real 0m0.066s real 0m0.876s real 0m8.686s
user 0m0.005s user 0m0.028s user 0m0.278s user 0m2.724s
sys 0m0.007s sys 0m0.035s sys 0m0.424s sys 0m4.232s
timing fancier use of GHC.Buf following bmk
real 0m0.011s real 0m0.018s real 0m0.831s real 0m8.218s
user 0m0.002s user 0m0.003s user 0m0.034s user 0m0.289s
sys 0m0.006s sys 0m0.013s sys 0m0.236s sys 0m2.447s
timing Pipes.ByteString following sibi
real 0m0.012s real 0m0.020s real 0m0.845s real 0m8.241s
user 0m0.003s user 0m0.004s user 0m0.020s user 0m0.175s
sys 0m0.007s sys 0m0.014s sys 0m0.239s sys 0m2.509s
然后使用mmap
timing Lazy.MMap following dons and Thomas M. DuBuisson
real 0m0.006s real 0m0.006s real 0m0.037s real 0m0.133s
user 0m0.002s user 0m0.002s user 0m0.006s user 0m0.051s
sys 0m0.003s sys 0m0.003s sys 0m0.013s sys 0m0.061
timing Pipes.ByteString.MMap with SafeT machinery
real 0m0.006s real 0m0.010s real 0m0.051s real 0m0.196s
user 0m0.002s user 0m0.004s user 0m0.012s user 0m0.099s
sys 0m0.003s sys 0m0.005s sys 0m0.016s sys 0m0.072s
timing Pipes.ByteString.MMap 'withFile' style
real 0m0.008s real 0m0.008s real 0m0.142s real 0m0.134s
user 0m0.002s user 0m0.002s user 0m0.007s user 0m0.046s
sys 0m0.004s sys 0m0.004s sys 0m0.016s sys 0m0.066s