多行*非*与attoparsec匹配

时间:2016-02-12 09:19:25

标签: haskell attoparsec

我正在玩解析(PostgreSQL)日志,这些日志可以包含多行条目。

2016-01-01 01:01:01 entry1
2016-01-01 01:01:02 entry2a
    entry2b
2016-01-01 01:01:03 entry3

所以 - 使用Perl或Python脚本我只需抓住下一行,如果它不是以时间戳开头,则将其附加到上一个日志条目。通过attoparsec连接io-streams来解决这个问题的合理方法是什么?我显然想用lookAhead做一些事情并且没有匹配时间戳,但我的大脑只是遗漏了一些东西。

没有 - 仍然无法看到它。我已经剥夺了我所拥有的东西。解析一行很容易。我无法弄清楚如何解析"直到"另一个解析模式 - 我可以看到我可以使用的lookAhead函数,但我不知道如何应用" not"条件。

我无法看到我如何匹配。我的大脑完全可能已经抓住了。

{-# LANGUAGE OverloadedStrings #-}

module DummyParser (
    LogStatement (..), parseLogLine
    -- and, so we can test it...
    , LogTimestamp , parseTimestamp
    , parseSqlStmt
    , newLineAndTimestamp
) where

{-  we want to parse...
TIME001 statement: SELECT true;
TIME002 statement: SELECT 'b',
  'c';
TIME003 statement: SELECT 3;
-}

import           Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8            as B

type LogTimestamp = Int

data LogStatement = LogStatement {
     l_ts  :: LogTimestamp
    ,l_sql :: String
} deriving (Eq, Show)


restOfLine :: Parser B.ByteString
restOfLine = do
    rest <- takeTill (== '\n')
    isEOF <- atEnd
    if isEOF then
        return rest
    else
        (char '\n') >> return rest


-- e.g. TIME001
parseTimestamp :: Parser LogTimestamp
parseTimestamp  = do
  string "TIME"
  digits  <- count 3 digit
  return (read digits)


-- e.g. statement: SELECT 1
parseSqlStmt :: Parser String
parseSqlStmt = do
    string "statement: "
    -- How can I match until the next timestamp?
    sql <- restOfLine
    return (B.unpack sql)


newLineAndTimestamp :: Parser LogTimestamp
newLineAndTimestamp = (char '\n') *> parseTimestamp


spaces :: Parser ()
spaces = do
    skipWhile (== ' ')


-- e.g. TIME001 statement: SELECT * FROM schema.table;
parseLogLine :: Parser LogStatement
parseLogLine = do
    log_ts <- parseTimestamp
    spaces
    log_sql <- parseSqlStmt
    let ls = LogStatement log_ts log_sql
    return ls

编辑:所以,这是我最后得到了感谢他们的帮助

isTimestampNext = lookAhead parseTimestamp *> pure()

parseLogLine :: Parser LogStatement
parseLogLine = do
    log_ts <- parseTimestamp
    spaces
    log_sql <- parseSqlStmt
    extraLines <- manyTill restOfLine (endOfInput <|> isTimestampNext)
    let ls = LogStatement log_ts (log_sql ++ (B.unpack $ B.concat extraLines))
    return ls

1 个答案:

答案 0 :(得分:1)

我在许多attoparsec问题上分享了这个组合:

notFollowedBy p = p >> fail "not followed by"

您的解决方案就像

parseLogLine :: Parser LogStatement
parseLogLine = do
    log_ts <- parseTimestamp
    spaces
    log_sql <- parseSqlStmt
    newlineLeftover <- ((notFollowedBy parseTimestamp) *> parseSqlStmt) <|> pure ""
    let ls = LogStatement log_ts (log_sql ++ newlineLeftover
    return ls

*> newlineLeftOver {{1}}表达的右手需要更多的工作,我猜,但总的想法就是这样。