Question

以下是我要解析的文件示例：

XX00135                   ABCDEFGHIJ RISK SOLUTIONS            PAGE NO :      7
BEG PER: 03/17/2014            CURRENT COMPANY                       03/18/2014
END PER: 03/18/2014       QA PROCESS - REJECT REPORT                   20:28:36

BATCH: 123456789 CONTRIB: 987654321 - ABCDE FGHI-SAN DIEGO
                                                    QUOTE BACK: 1A23B45C79

CODE   ACCOUNT NO           TYP COMPANY NAME         BEG DATE END DATE ERR
------ -------------------- --- -------------------- -------- -------- ---
12345  1234567890001        AB  ABCDE FGHI PRODUCTS  20140314 20140914 059


XX00135                   ABCDEFGHIJ RISK SOLUTIONS            PAGE NO :      8
BEG PER: 03/17/2014            CURRENT COMPANY                       03/18/2014
END PER: 03/18/2014       QA PROCESS - REJECT REPORT                   20:28:36

BATCH: 234567890 CONTRIB: 987654321 - ABCDE FGHI-SAN DIEGO
                                                    QUOTE BACK: 5F7A657G87

CODE   ACCOUNT NO           TYP COMPANY NAME         BEG DATE END DATE ERR
------ -------------------- --- -------------------- -------- -------- ---
12346  2345678901           AB  ABCDE FGHI PRODUCTS  20140129 20140729 059
12346  3456789012           AB  ABCDE FGHI PRODUCTS  20140317 20140917 059


XX00135                   ABCDEFGHIJ RISK SOLUTIONS            PAGE NO :      9
BEG PER: 03/17/2014            CURRENT COMPANY                       03/18/2014
END PER: 03/18/2014       QA PROCESS - REJECT REPORT                   20:28:36

BATCH: 345678901 CONTRIB: 987654321 - ABCDE FGHI-SAN DIEGO
                                                    QUOTE BACK: 6K75L8791L

CODE   ACCOUNT NO           TYP COMPANY NAME         BEG DATE END DATE ERR
------ -------------------- --- -------------------- -------- -------- ---
12346  4567890123           AB  ABCDE FGHI PRODUCTS  20140317 20140917 059
12346  4567890123           AB  ABCDE FGHI PRODUCTS  20140317 20140917 059
 NUMBER OF SETS REJECTED ARE :         13  TOTAL SETS IN BATCH:     16,940

                           *** END OF REPORT ***

以下是我的模块中的一系列片段：

module XX00135 (parseFile) where

import Control.Applicative ((<$>), (<*>), (<*))
import Text.ParserCombinators.Parsec hiding (Line)

data Line = Line { code    :: String
                 , account :: String
                 , aType   :: String
                 , company :: String
                 , begDate :: String
                 , endDate :: String
                 , errCode :: String }

data Page = Page { periodBeginning :: String
                 , periodEnd       :: String
                 , reportDate      :: String
                 , batch           :: String
                 , contrib         :: String
                 , quoteBack       :: String
                 , lineList        :: [Line] }

data Report = Report { pages :: [Page] }


parseReportDate :: Parser String
parseReportDate =
  manyTill anyChar (string "CURRENT COMPANY") >> spaces >> count 10 anyChar

headers :: Parser String
headers =
  choice [ try (string "\n")
         , try (string "CODE   ACCOUNT NO           TYP COMPANY NAME         BEG DATE     END DATE ERR")
         , try (string "------ -------------------- --- -------------------- -------- -------- ---") ]

line :: Parser Line
line =
  Line <$> count  6 anyChar <* space
       <*> count 20 anyChar <* space
       <*> count  3 anyChar <* space
       <*> count 20 anyChar <* space
       <*> count  8 anyChar <* space
       <*> count  8 anyChar <* space
       <*> count  3 anyChar <* newline

page :: Parser Page
page =
  Page <$> (manyTill anyChar (string "BEG PER:")    >> space >> count 10 anyChar)
       <*> parseReportDate
       <*> (manyTill anyChar (string "END PER:")    >> space >> count 10 anyChar)
       <*> (manyTill anyChar (string "BATCH:")      >> space >> count  9 anyChar)
       <*> (space >> string "CONTRIB:"              >> space >> count  9 anyChar)
       <*> (manyTill anyChar (string "QUOTE BACK:") >> space >> count 10 anyChar
       <*   skipMany1 headers)
       <*> (manyTill line (twoNewLines <|> footer))

report :: Parser Report
report = Report <$> manyTill page (try footer)

twoNewLines :: Parser ()
twoNewLines = (count 2 newline) >> return ()

footer :: Parser ()
footer = (space >> string "NUMBER OF SETS REJECTED" >> manyTill anyChar (string "*** END OF REPORT ***") >> optional eof) >> return ()

parseFile :: [(String, String)] -> String -> String
parseFile errors text =
  let rs = case parse (manyTill report eof) "" text of
      ...

完整文件中有115行。当我cat文件并将其传送到我的haskell时，我得到：

(line 116, column 1);
unexpected end of input
expecting "BEG PER:"

我只是忽略了页脚和随后的任何内容。但我的完整用例是cat多个文件和管道到我的haskell，这意味着我不能丢弃页脚及其后的所有内容。一旦我开始试图忽略页脚而不是丢弃它，我的问题就开始了。这可能是一件简单的事情，我只是感到困惑和过度看待显而易见的东西。

如果您需要更多代码，请与我们联系。我在解析之后做了一些转换，我不想用不必要的细节来混淆代码。

谢谢！

Answer 1

我已经解决了这个问题。代码有点不同，我不确定究竟是什么解决了这个问题。我花了很多时间盯着代码，并在这里和那里做一点改变。不过，我认为这与cat将newline附加到文件有关。所以我改变了footer：

footer = space >> string "NUMBER OF SETS REJECTED"
       >> anyChar `manyTill` (string "*** END OF REPORT ***") >> newline >> string ""

现在页脚在文件末尾消耗额外的newline，并返回一个字符串。我在footer（页面末尾）中使用eop：

eop =
  choice [ count 2 newline
         , footer ]

我在eop的最后一行使用page：

<*> line `manyTill` eop

report现在是：

report = count 2 newline >> Report <$> many page

我也改变了page。我认为它以意想不到的方式消耗anyChar。所以现在我扔掉了每一页的第一行：

page = firstLine >>
  Page <$> (string "BEG PER:" >> space >> count 10 anyChar)
       ...

firstLine =
  string "XX00135                   ABCDEFGHIJ RISK SOLUTIONS            PAGE NO :"
  >> spaces > many digit >> newline

我认为这涵盖了我所做的所有重要更改，最终使解析成功。它现在解析cat命令中的单个文件，以及cat命令连接的多个文件。好极了！我爱哈斯克尔。

Answer 2

看起来页面消耗页脚：

  <*> (manyTill line (twoNewLines <|> footer))

因此报告不会消耗页脚：

report = Report <$> manyTill page (try footer)

也许你需要'sepBy'来识别你''页面'之间的'twoNewLines'（没有最后的许多帖子）。

Haskell Parsec意外的输入结束

2 个答案: