我需要解析一个巨大的日志文件。我想在Haskell中为学习目的而做(我是初学者)。 日志文件的布局类似于:
parameter a_parameter_name errors: 5
error bla bla1
error bla bla2
error bla bla bla3
error bla bla bla4
error bla bla bla5
some garbage line
parameter an_other_parameter_name errors: 7
error bla bla1
error bla bla2
error bla bla3
error bla bla4
error bla bla5
error bla bla6
error bla bla7
some garbage line
some garbage line
some garbage line
...
此日志文件包含两种主要线型:
“错误”行与前一个参数行相关。 其他方面并不有趣。
我想要做的是打印出错误数量排序的参数及其错误。所以在这里我想得到:
parameter an_other_parameter_name errors: 7
error bla bla1
error bla bla2
error bla bla3
error bla bla4
error bla bla5
error bla bla6
error bla bla7
parameter a_parameter_name errors: 5
error bla bla1
error bla bla2
error bla bla bla3
error bla bla bla4
error bla bla bla5
使用以下代码,我获得了有趣的行列表
import System.IO
import Data.List
interesting :: String -> Bool
interesting s = isPrefixOf "parameter" s || isPrefixOf "error" s
main = do
logFile <- openFile "log.txt" ReadMode
contents <- hGetContents logFile
let interestingLines = filter interesting $ lines contents
print interestingLines
hClose logFile
从这个列表中,我想构建一个三元组列表: [(参数,errorsNb,[errors])],我可以对其进行排序和打印。 但我不知道如何将错误行与其相关参数行分组。但也许这不是正确的方法...... 欢迎任何帮助!
奥利弗
答案 0 :(得分:1)
我修改了CIS194(第2周)的解决方案。
转换为二进制树数据结构和从文件中读取延迟将是很好的学习练习。
type Name = String
type Count = Int
data MessageType = Param Name Count
| Error String
| Unknown String
deriving (Show, Eq)
parseMessage :: String -> MessageType
parseMessage line =
case words line of
("parameter":n:_:c:_) -> Param n (read c)
("error":msg) -> Error (unwords msg)
xs -> Unknown $ unwords xs
data LogMessage = LogMessage Name Count [MessageType]
deriving (Show, Eq)
parse :: String -> [MessageType]
parse = map parseMessage . lines
isError :: MessageType -> Bool
isError (Error _) = True
isError _ = False
isUnknown :: MessageType -> Bool
isUnknown (Unknown _) = True
isUnknown _ = False
(.||.) :: (a -> Bool) -> (a -> Bool) -> (a -> Bool)
(.||.) f g a = (f a) || (g a)
toLogMsg :: [MessageType] -> [LogMessage]
toLogMsg [] = []
toLogMsg (x:xs) =
case x of
Param n c ->
LogMessage n c (takeWhile isError xs) : toLogMsg (dropWhile (isError .||. isUnknown) xs)
_ -> toLogMsg $ dropWhile (isError .||. isUnknown) xs
errMsgList :: [MessageType] -> [String]
errMsgList = foldr (\(Error m) acc -> m : acc) []
toTriple :: [LogMessage] -> [(String, Count, [String])]
toTriple = foldl(\acc (LogMessage n c xs) -> (n, c, errMsgList xs) : acc) []
main :: IO ()
main = do
ts <- toLogMsg . parse <$> readFile "./src/2017/so-log.txt"
mapM_ print ts
mapM_ print (toTriple ts)
您提供的样本输出为:
("an_other_parameter_name",7,["bla bla1","bla bla2","bla bla3","bla bla4","bla bla5","bla bla6","bla bla7"])
("a_parameter_name",5,["bla bla1","bla bla2","bla bla bla3","bla bla bla4","bla bla bla5"])
LogTriple "a_parameter_name" 5 [Error "bla bla1",Error "bla bla2",Error "bla bla bla3",Error "bla bla bla4",Error "bla bla bla5"]
LogTriple "an_other_parameter_name" 7 [Error "bla bla1",Error "bla bla2",Error "bla bla3",Error "bla bla4",Error "bla bla5",Error "bla bla6",Error "bla bla7"]