Conduit和Attoparsec:解析错误时意外终止

时间:2015-12-09 18:40:35

标签: haskell conduit attoparsec

我正在尝试将我写回来的日志文件解析器转换回管道,我遇到了一个问题。我将简化解析器本身的细节,因为这与问题无关。我有一个如下所示的日志文件:

200 GET
404 POST
500 GET
FOO
301 PUT
302 GET
201 POST

所以解析代码非常简单:

data SimpleLogEntry = SimpleLogEntry {
      status :: Int
    , method :: String
} deriving (Show, Eq)


parseHTTPStatus :: Parser Int
parseHTTPStatus = validate <$> decimal
    where validate d = if (d >= 200 && d < 999) then d else 100


parseHTTPMethod :: Parser String
parseHTTPMethod =
        (stringCI "GET" *> return "Get")
    <|> (stringCI "POST" *> return "Post")
    <|> (stringCI "PUT" *> return "Put")
    <|> return "Unknown"


parseLogLine :: Parser SimpleLogEntry
parseLogLine = fmap SimpleLogEntry
        parseHTTPStatus
    <*> (space *> parseHTTPMethod)

到目前为止一切顺利。以下是我在管道中实现这一点的方法:

import Prelude hiding (lines)

import Control.Applicative
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Trans.Resource (runResourceT, ResourceT)
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as B8
import Data.Conduit
import qualified Data.Conduit.Attoparsec as CA
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL


logLines:: Source (ResourceT IO) B.ByteString
logLines = CB.sourceFile "~/test.log" $= CB.lines


parseEntry :: ConduitM B8.ByteString SimpleLogEntry (ResourceT IO) ()
parseEntry = CA.conduitParserEither parseLogLine =$= awaitForever go
    where
        go (Left err) = liftIO $ putStrLn ("Got an error: " ++ CA.errorMessage err)
        go (Right (_, logEntry)) = yield logEntry


sink :: Sink SimpleLogEntry (ResourceT IO) ()
sink = CL.mapM_ (\t -> liftIO $ putStrLn $ "Got a status: " ++ (show . status) t)


main :: IO ()
main = runResourceT $ logLines $= parseEntry $$ sink

运行main时,我得到了这个输出:

Got a status: 200
Got a status: 404
Got a status: 500
Got an error: Failed reading: takeWhile1

我无法理解为什么管道在此时终止,而是继续解析文件的下一行,就像我想做的那样。阅读Data.Conduit.Attoparsec的文档,这似乎就是为conduitParserEither设计的用例。

更新

Per @Fabian,事实证明conduitParserEither并不是我想要的。以下是parseEntry的定义,它可以完成我想要做的事情:

parseEntry' :: ConduitM B8.ByteString SimpleLogEntry (ResourceT IO) ()
parseEntry' = (CL.map (parseOnly parseLogLine)) =$= awaitForever go
    where
        go (Left err) = liftIO $ putStrLn ("Got an error: " ++ err)
        go (Right logEntry) = yield logEntry

1 个答案:

答案 0 :(得分:0)

<select id="myBox" onchange="selectChanged()"> <option color="none" style="color: black"> -- </option> <option color="red" style="color:red"> red </option> <option color="blue" style="color:blue"> blue </option> <option color="green" style="color:green"> green </option> </select>(或conduitParser)也可以在一行上使用多个令牌:例如,以下输入会产生相同的结果:

conduitParserEither

所以解析器没有继续是有意义的,因为它不知道下一个标记的开始位置。