如何使用amazonka,conduit和lazy bytestring进行分块

时间:2016-06-03 15:10:40

标签: haskell conduit

我编写了下面的代码来模拟从S3上传到Lazy ByteString(将通过网络套接字接收。这里,我们通过读取大小约为100MB的文件来模拟)。以下代码的问题在于它似乎强制将整个文件读入内存而不是将其分块(cbytes) - 将会理解为什么分块不起作用的指示:

import Control.Lens
import Network.AWS
import Network.AWS.S3
import Network.AWS.Data.Body
import System.IO
import           Data.Conduit (($$+-))
import           Data.Conduit.Binary (sinkLbs,sourceLbs)
import qualified Data.Conduit.List as CL (mapM_)
import           Network.HTTP.Conduit (responseBody,RequestBody(..),newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS

example :: IO PutObjectResponse
example = do
    -- To specify configuration preferences, newEnv is used to create a new Env. The Region denotes the AWS region requests will be performed against,
    -- and Credentials is used to specify the desired mechanism for supplying or retrieving AuthN/AuthZ information.
    -- In this case, Discover will cause the library to try a number of options such as default environment variables, or an instance's IAM Profile:
    e <- newEnv NorthVirginia Discover

    -- A new Logger to replace the default noop logger is created, with the logger set to print debug information and errors to stdout:
    l <- newLogger Debug stdout

    -- The payload for the S3 object is retrieved from a file that simulates lazy bytestring received over network
    inb <- LBS.readFile "out"
    lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
    let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)

    -- We now run the AWS computation with the overriden logger, performing the PutObject request:
    runResourceT . runAWS (e & envLogger .~ l) $
        send ((putObject "yourtestenv-change-it-please" "testbucket/test" cbytes) & poContentType .~ Just "text; charset=UTF-8")

main = example >> return ()

使用RTS -s选项运行可执行文件显示整个内容被读入内存(~113MB最大驻留时间 - 我确实看到~87MB一次)。另一方面,如果我使用chunkedFile,它会被正确分块(最大驻留时间约为10MB)。

2 个答案:

答案 0 :(得分:2)

这一点很清楚

  inb <- LBS.readFile "out"
  lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
  let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)

应改写为

  lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
  let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (C.sourceFile "out")

正如你所写,导管的目的被打败了。整个文件需要由LBS.readFile累积,但是当馈送到sourceLBS时,需要按块拆分块。 (如果惰性IO工作正常,则可能不会发生这种情况。)sourceFile以递增方式读取文件,按块查看。可能是,例如, toBody累积整个文件,在这种情况下,管道点在不同点被击败。看了send的来源等等,我看不到任何可以做到这一点的事情。

答案 1 :(得分:0)

我不确定,但我认为其罪魁祸首LBS.readFile documentation说:

readFile :: FilePath -> IO ByteString

Read an entire file lazily into a ByteString.
The Handle will be held open until EOF is encountered.

chunkedFile以管道的方式工作 - 或者你可以使用

sourceFile :: MonadResource m => FilePath -> Producer m ByteString

来自(conduit-extras/Data.Conduit.Binary)而不是LBS.readFile,但我不是专家。