使用attoparsec解析多行日志

时间:2019-01-31 19:41:48

标签: parsing haskell logging attoparsec

我正在尝试解析这样的多行日志

[xxx] This is 1
[xxx] This is also 1
[yyy] This is 2

我定义了这些类型

{-# LANGUAGE OverloadedStrings #-}

module Parser where

import Prelude hiding(takeWhile)
import Data.Text
import Data.Word
import Data.Attoparsec.Text as T
import Data.Char
import Data.String

data ID    = ID String deriving (Eq, Show)
data Entry = Entry ID String deriving (Eq, Show)
data Block = Block ID [String]
data Log   = Log [Block]

和定义这些解析器:

parseID :: Parser ID
parseID = do
  char '['
  id <- takeTill ( == ']' )
  char ']'
  return $ ID $ unpack id

parseEntry :: Parser Entry
parseEntry = do
  id <- parseID
  char ' '
  content <- takeTill isEndOfLine
  return $ Entry id (unpack content)

当我做类似parseOnly parseEntry entryString之类的事情并且返回Entry时,此方法正常。

问题是当我尝试解析开始时添加的日志之类的内容时。 我会得到一个[Entry],但我想得到[Block]

我还希望当2个或更多连续的行具有相同的ID(例如xxx)时,应该将其存储在同一块中,因此对于解析上述日志,我想找回

[block1, block2]
-- block1 == Block "xxx" ["This is 1", "This is also 1"]
-- block2 == Block "yyy" ["This is 2"]

如何根据ID是否发生变化,使解析器创建新块或将其添加到最后生成的块中?

一个明显的解决方案是简单地生成一个[Entry],然后使用折叠函数以正确的逻辑将其转换为[Block],但是我要进行2次遍历,对数进行1次遍历,然后[Entry]上的另一个功能似乎不仅对于大型原木而言性能不太好,而且感觉做错了方法(根据我对totoparsec的有限了解)

还有其他想法吗?

编辑

Bob Dalgleish解决方案基本上可以正常工作(非常感谢!!!),只需进行一些调整即可使其工作。 这是我的最终解决方案:

data ID    = ID String deriving (Eq, Show)
data Entry = Entry ID String deriving (Eq, Show)
data Block = Block ID [String] deriving (Eq, Show)
data Log   = Log [Block] deriving (Eq, Show)

parseID :: Parser ID
parseID = do
  char '['
  id <- takeTill ( == ']' )
  char ']'
  return $ ID $ unpack id

parseEntry :: Parser Entry
parseEntry = do
  id <- parseID
  char ' '
  content <- takeTill isEndOfLine
  return $ Entry id (unpack content)

parseEntryFor :: ID -> Parser Entry
parseEntryFor blockId = do
  id <- parseID
  if blockId == id
     then do
       char ' '
       content <- takeTill isEndOfLine
       endOfLine <|> endOfInput
       return $ Entry id (unpack content)
  else fail "nonmatching id"

parseBlock :: Parser Block
parseBlock = do
  (Entry entryId s) <- parseEntry
  let newBlock = Block entryId [s]
  endOfLine <|> endOfInput
  entries <- many' (parseEntryFor entryId)
  return $ Block entryId (s : Prelude.map (\(Entry _ s') -> s') entries)

1 个答案:

答案 0 :(得分:1)

您需要为Block s提供一个解析器。它接受Entry,对具有相同ID的Entry进行前瞻;如果不一样,它将回溯并返回到目前为止的内容。

首先,让我们介绍一个条件Entry解析器:

parseEntryFor :: ID -> Parser Entry
parseEntryFor blockId = do
  id <- parseEntry
  if blockId == id
  then do
         char ' '
         content <- takeTill isEndOfLine
         endOfLine
         return $ Entry id (unpack content)
  else fail "nonmatching id"

-- |A Block consists of one or more Entry's with the same ID
parseBlock :: Parser Block
parseBlock = do
  (Entry entryId s) <- parseEntry
  let newBlock = Block entryId [s]
  endOfLine
  entries <- many' (parseEntryFor entryId)
  return $ Block entryId s: (map (\(Entry _ s') -> x') entries)

(此代码未经测试,因为我只使用过Parsec。)