用parsec解析文件

时间:2015-06-27 04:44:29

标签: haskell parsec

我正在尝试解析文件[1]但是下面的代码无法正常工作。

import Control.Applicative hiding ( many , ( <|> )  )
import Text.Parsec
import Text.Parsec.String (Parser) 
import qualified Text.Parsec.Token as T
import Text.Parsec.Language (emptyDef)

--Scheme Code;ISIN Div Payout/ ISIN Growth;ISIN Div Reinvestment;Scheme Name;Net Asset Value;Repurchase Price;Sale Price;Date
--120523;INF846K01ET8;INF846K01EU6;Axis Triple Advantage Fund - Direct Plan - Dividend Option;13.3660;13.2323;13.3660;19-Jun-2015

data MFund = MFund Integer  String String String Double Double Double String deriving (Show) --String String String  deriving (Show)

lexer :: T.TokenParser st
lexer  = T.makeTokenParser emptyDef

natural :: Parser Integer
natural = T.natural lexer

float :: Parser Double
float = T.float lexer

eol :: Parser String
eol =  try (string "\r\n")
   <|> try (string "\n")
   <|> try (string "\r")

parseFund :: Parsec String () MFund
parseFund = MFund <$> natural 
              <*> (char ';' *> (many1 alphaNum <|> string "-")) 
              <*> (char ';' *> (many1 alphaNum <|> string "-")) 
              <*> (char ';' *> manyTill anyChar (char ';'))
              <*> float
              <*> (char ';' *> float)
              <*> (char ';' *> float)
              <*> (char ';' *> many1 (alphaNum <|> char '-'))



parseBlockFund :: Parsec String () [MFund]
parseBlockFund = manyTill anyChar eol *> space *> eol *> endBy parseFund eol

parseMutual :: Parsec String () [MFund]
parseMutual = concat <$> (manyTill anyChar eol *> space *> eol *>
          sepBy parseBlockFund (space *> eol))

{- each scheme is seperated by space and end of line-}
parseSchemeBlock :: Parsec String () [MFund]
parseSchemeBlock = concat <$> (sepBy parseMutual (space *> eol))

parseFile :: Parsec String () [MFund]
parseFile = 
   string "Scheme Code;ISIN Div Payout/ ISIN Growth;ISIN Div Reinvestment;Scheme Name;Net Asset Value;Repurchase Price;Sale Price;Date" *>
   eol *> space *> eol *> parseSchemeBlock <* eof


main :: IO ()
main = do 
 input <- readFile "data.txt"
 print input
 case parse  parseFile  "" input  of
    Left err -> print err
    Right val -> print val

我开始从parseFile函数解析文件,该函数消耗第一行,行尾,空格和行尾。现在我解析每个由空格和行尾分隔的方案块。在每个方案块中,我们有许多相互的块,它们也被空格和行尾分开,所以我首先使用字符串直到行尾,然后是空格和行尾,并调用parseBlockFund。

我怀疑的问题是当我调用parseMutual函数时,当它到达方案块的末尾时,它会占用空间和行尾,而这应该被parseSchemeBlock函数占用。我试过尝试功能,但它没有工作,所以我不确定我是否正确思考。有人可以告诉我这段代码有什么问题。

当我删除eof时,一个有趣的事情是,这段代码能够解析至少一个方案块[2]。

[1] http://portal.amfiindia.com/spages/NAV0.txt

[2] https://github.com/mukeshtiwari/Puzzles/blob/master/Assigment/Api.hs

0 个答案:

没有答案