解析Haskell中的日志文件

时间:2017-03-09 18:15:40

标签: haskell

我需要解析一个巨大的日志文件。我想在Haskell中为学习目的而做(我是初学者)。 日志文件的布局类似于:

parameter a_parameter_name errors: 5
error bla bla1
error bla bla2
error bla bla bla3
error bla bla bla4
error bla bla bla5
some garbage line
parameter an_other_parameter_name errors: 7
error bla bla1
error bla bla2
error bla bla3
error bla bla4
error bla bla5
error bla bla6
error bla bla7

some garbage line
some garbage line
some garbage line
...

此日志文件包含两种主要线型:

  1. 以“参数”开头的行;
  2. 以“错误”开头的行。
  3. “错误”行与前一个参数行相关。 其他方面并不有趣。

    我想要做的是打印出错误数量排序的参数及其错误。所以在这里我想得到:

    parameter an_other_parameter_name errors: 7
    error bla bla1
    error bla bla2
    error bla bla3
    error bla bla4
    error bla bla5
    error bla bla6
    error bla bla7
    parameter a_parameter_name errors: 5
    error bla bla1
    error bla bla2
    error bla bla bla3
    error bla bla bla4
    error bla bla bla5
    

    使用以下代码,我获得了有趣的行列表

    import System.IO
    import Data.List
    
    interesting :: String -> Bool
    interesting s = isPrefixOf "parameter" s || isPrefixOf "error" s
    
    main = do
        logFile <- openFile "log.txt" ReadMode
        contents <- hGetContents logFile
        let interestingLines = filter interesting $ lines contents
        print interestingLines
        hClose logFile
    

    从这个列表中,我想构建一个三元组列表: [(参数,errorsNb,[errors])],我可以对其进行排序和打印。 但我不知道如何将错误行与其相关参数行分组。但也许这不是正确的方法...... 欢迎任何帮助!

    奥利弗

1 个答案:

答案 0 :(得分:1)

我修改了CIS194(第2周)的解决方案。
转换为二进制树数据结构和从文件中读取延迟将是很好的学习练习。

type Name = String
type Count = Int
data MessageType =  Param Name Count
                 | Error String
                 | Unknown String
                   deriving (Show, Eq)

parseMessage :: String -> MessageType
parseMessage line =
    case  words line of
      ("parameter":n:_:c:_) -> Param n (read c)
      ("error":msg)         -> Error (unwords msg)
      xs                    -> Unknown $ unwords xs

data LogMessage = LogMessage Name Count [MessageType]
               deriving (Show, Eq)

parse :: String -> [MessageType]
parse = map parseMessage .  lines

isError :: MessageType -> Bool
isError (Error _) = True
isError _ = False


isUnknown :: MessageType -> Bool
isUnknown  (Unknown _)  = True
isUnknown _ = False

(.||.) :: (a -> Bool) -> (a -> Bool) -> (a -> Bool)
(.||.) f g a = (f a) || (g a)

toLogMsg :: [MessageType] -> [LogMessage]
toLogMsg [] = []
toLogMsg (x:xs) =
    case x of
      Param n c ->
          LogMessage n c (takeWhile isError xs) : toLogMsg (dropWhile (isError .||. isUnknown) xs)
      _         -> toLogMsg $ dropWhile (isError .||. isUnknown) xs



errMsgList :: [MessageType] -> [String]
errMsgList = foldr (\(Error m) acc -> m : acc) []

toTriple :: [LogMessage] -> [(String, Count, [String])]
toTriple = foldl(\acc (LogMessage n c xs) -> (n, c, errMsgList xs) : acc) []



main :: IO ()
main = do
       ts <- toLogMsg . parse <$> readFile "./src/2017/so-log.txt"
       mapM_ print ts
       mapM_ print (toTriple ts)

您提供的样本输出为:

("an_other_parameter_name",7,["bla bla1","bla bla2","bla bla3","bla bla4","bla bla5","bla bla6","bla bla7"])
("a_parameter_name",5,["bla bla1","bla bla2","bla bla bla3","bla bla bla4","bla bla bla5"])


LogTriple "a_parameter_name" 5 [Error "bla bla1",Error "bla bla2",Error "bla bla bla3",Error "bla bla bla4",Error "bla bla bla5"]
LogTriple "an_other_parameter_name" 7 [Error "bla bla1",Error "bla bla2",Error "bla bla3",Error "bla bla4",Error "bla bla5",Error "bla bla6",Error "bla bla7"]