使用需要以特定单词结尾的parsec来解析字符串?

时间:2018-11-10 07:36:55

标签: haskell parsec

我正在做一些编程练习。我正在研究的格式具有以下输入格式:

Give xxxxxxxxx as yyyy.

xxxxxxxx可以采用几种格式,这些格式会在这些练习中反复出现。特别是它的二进制(八位一组,用空格分隔),十六进制(无空格)或八进制(最多三组)。我已经为这些格式编写了解析器-但是它们都被“ as”迷住了。他们看起来像这样

binaryParser = BinaryQuestion  <$> (count 8 ( oneOf "01") ) `sepBy1` space

我解决了这种怪异现象(修剪掉了不必要的代码)

{-# LANGUAGE OverloadedStrings #-}
import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B 

data Input = BinaryQuestion [String] 
           | HexQuestion [String]
           | OctalQuestion [String]
  deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show

test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."

targetParser :: Parser Target
targetParser = string "word" >> return Word

wrapAs :: Parser a -> Parser [a]
wrapAs kind = manyTill kind (try (string " as"))
inputParser :: Parser Input
inputParser = choice [try binaryParser, try (space >> hexParser), try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion  <$> wrapAs (space >> count 8 ( oneOf "01") )
hexParser :: Parser Input
hexParser = HexQuestion <$> wrapAs (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion  <$> wrapAs (many1 space >> many1 (oneOf ['0'..'7']))

questionParser :: Parser Question
questionParser = do
  string "Give"
  inp <- inputParser 
  string " a "
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

我不喜欢我需要在Input的解析中使用以下字符串“ as”,并且它们通常不太可读。我的意思是,使用正则表达式具有尾随字符串将是微不足道的。因此,我对自己的解决方案不满意。

有没有一种方法可以重用“漂亮的”解析器-或至少使用更具可读性的解析器?

附加说明

我希望可以继续工作的代码如下:

{-# LANGUAGE OverloadedStrings #-}

import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B 

data Input = BinaryQuestion [String] 
           | HexQuestion [String]
           | OctalQuestion [String]
  deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show

test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."

targetParser :: Parser Target
targetParser = string "word" >> return Word

inputParser :: Parser Input
inputParser = choice [try binaryParser, try hexParser, try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion  <$> count 8 ( oneOf "01") `sepBy1` space
hexParser :: Parser Input
hexParser = HexQuestion <$> many1 (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion  <$>  (many1 (oneOf ['0'..'7'])) `sepBy1` space

questionParser :: Parser Question
questionParser = do
  string "Give"
  many1 space
  inp <- inputParser 
  many1 space
  string "as a"
  many1 space
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

但是parseTest questionParser test3会给我parse error at (line 1, column 22): unexpected "a"

我想问题是空格被用作输入内的分隔符,但也出现在as a字符串中。我在parsec内看不到任何适合的功能。沮丧的是,我尝试在各个地方添加try-但是没有成功。

2 个答案:

答案 0 :(得分:1)

您正在使用以下模式:Give {source} as a {target}。 因此,您可以进行管道操作:

  • Give a的解析器
  • {source}的解析器
  • as a的解析器
  • {target}的解析器

无需用{source}的解析器包装as a的解析器。

答案 1 :(得分:1)

编辑:

如评论中所述,干净的解析器不能被本文末尾的先前的解决方案重用。

它导致开发了一个使用Parsec的小型解析器,以处理所有可能的情况,以便最终解析以空格分隔的数字字符串,即

  1. 以空格结尾,后跟非必需数字字符,例如“ ..11为”
  2. 以空格结尾,例如“ ..11”
  3. eof结尾,例如“ ..11”

以及以下解析器:

numParser:: (Parser Char->Parser String)->[Char]->Parser [String]
numParser repeatParser digits = 
    let digitParser = repeatParser $ oneOf digits
        endParser = (try $ lookAhead $ (space >> noneOf digits)) <|>
                    (try $ lookAhead $ (space <* eof))           <|> 
                    (eof >> return ' ')
    in do init <- digitParser
          rest <- manyTill (space >> digitParser) endParser
          return (init : rest)

binaryParseroctParser需要进行如下修改:

binaryParser = BinaryQuestion <$> numParser (count 8) "01"
octParser    = OctalQuestion  <$> numParser many1 ['0'..'7']

没什么,需要更改有问题的questionParser,以供参考,我在这里再次声明:

questionParser = do
  string "Give"
  many1 space
  inp <- inputParser 
  many1 space       --no need change to many
  string "as a"
  many1 space     
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

先前的解决方案:

endBy1中的功能manyText.Parsec在这种情况下很有帮助。

sepBy1的{​​{1}}替换为

endBy1

binaryParser = BinaryQuestion  <$> count 8 ( oneOf "01") `endBy1` space

octParser = OctalQuestion <$> (many1 (oneOf ['0'..'7'])) `endBy1` space 不同,sepBy1接下来将读取一些字符,以确定是否结束解析,因此,最后一个数字之后的一个空格将被占用,即

endBy1

因此,与其检查“作为...”之前的一个或多个空格,还需要检查或多个空格,那么为什么要使用Give 164 151 155 145 as a word. ^ this space will be consumed 函数而不是{{1 }},现在代码变为:

many