Question

以下是使用text和attoparsec执行CSV解析的代码图书馆：

import qualified Data.Attoparsec.Text as A
import qualified Data.Text as T

-- | Parse a field of a record.
field :: A.Parser T.Text -- ^ parser
field = fmap T.concat quoted <|> normal A.<?> "field"
  where
    normal  = A.takeWhile (A.notInClass "\n\r,\"")     A.<?> "normal field"
    quoted  = A.char '"' *> many between <* A.char '"' A.<?> "quoted field"
    between = A.takeWhile1 (/= '"') <|> (A.string "\"\"" *> pure "\"")


-- | Parse a block of text into a CSV table.
comma :: T.Text                   -- ^ CSV text
      -> Either String [[T.Text]] -- ^ error | table
comma text
  | T.null text = Right []
  | otherwise   = A.parseOnly table text
  where
    table  = A.sepBy1 record A.endOfLine A.<?> "table"
    record = A.sepBy1 field (A.char ',') A.<?> "record"

这适用于各种输入但不适用于那种情况在输入的末尾是一个尾随\n。

当前行为：

> comma "hello\nworld"
Right [["hello"],["world"]]

> comma "hello\nworld\n"
Right [["hello"],["world"],[""]]

通缉行为：

> comma "hello\nworld"
Right [["hello"],["world"]]

> comma "hello\nworld\n"
Right [["hello"],["world"]]

我一直试图解决这个问题，但我已经没有了。我快确定它必须是A.endOfInput的东西重要的锚点和我们唯一的“奖励”信息。关于如何的任何想法将其纳入代码？

一个可能的想法是在运行之前查看字符串的结尾 Attoparsec解析器并删除最后一个字符（如果是\r\n，则删除两个字符）但这似乎是一个我想在我的代码中避免使用的hacky解决方案。

可在此处找到图书馆的完整代码：https://github.com/lovasko/comma

Attoparsec的CSV解析问题

0 个答案: