Question

我正在做一些编程练习。我正在研究的格式具有以下输入格式：

Give xxxxxxxxx as yyyy.

xxxxxxxx可以采用几种格式，这些格式会在这些练习中反复出现。特别是它的二进制（八位一组，用空格分隔），十六进制（无空格）或八进制（最多三组）。我已经为这些格式编写了解析器-但是它们都被“ as”迷住了。他们看起来像这样

binaryParser = BinaryQuestion  <$> (count 8 ( oneOf "01") ) `sepBy1` space

我解决了这种怪异现象（修剪掉了不必要的代码）

{-# LANGUAGE OverloadedStrings #-}
import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B 

data Input = BinaryQuestion [String] 
           | HexQuestion [String]
           | OctalQuestion [String]
  deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show

test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."

targetParser :: Parser Target
targetParser = string "word" >> return Word

wrapAs :: Parser a -> Parser [a]
wrapAs kind = manyTill kind (try (string " as"))
inputParser :: Parser Input
inputParser = choice [try binaryParser, try (space >> hexParser), try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion  <$> wrapAs (space >> count 8 ( oneOf "01") )
hexParser :: Parser Input
hexParser = HexQuestion <$> wrapAs (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion  <$> wrapAs (many1 space >> many1 (oneOf ['0'..'7']))

questionParser :: Parser Question
questionParser = do
  string "Give"
  inp <- inputParser 
  string " a "
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

我不喜欢我需要在Input的解析中使用以下字符串“ as”，并且它们通常不太可读。我的意思是，使用正则表达式具有尾随字符串将是微不足道的。因此，我对自己的解决方案不满意。

有没有一种方法可以重用“漂亮的”解析器-或至少使用更具可读性的解析器？

附加说明

我希望可以继续工作的代码如下：

{-# LANGUAGE OverloadedStrings #-}

import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B 

data Input = BinaryQuestion [String] 
           | HexQuestion [String]
           | OctalQuestion [String]
  deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show

test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."

targetParser :: Parser Target
targetParser = string "word" >> return Word

inputParser :: Parser Input
inputParser = choice [try binaryParser, try hexParser, try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion  <$> count 8 ( oneOf "01") `sepBy1` space
hexParser :: Parser Input
hexParser = HexQuestion <$> many1 (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion  <$>  (many1 (oneOf ['0'..'7'])) `sepBy1` space

questionParser :: Parser Question
questionParser = do
  string "Give"
  many1 space
  inp <- inputParser 
  many1 space
  string "as a"
  many1 space
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

但是parseTest questionParser test3会给我parse error at (line 1, column 22): unexpected "a"

我想问题是空格被用作输入内的分隔符，但也出现在as a字符串中。我在parsec内看不到任何适合的功能。沮丧的是，我尝试在各个地方添加try-但是没有成功。

Answer 1

您正在使用以下模式：Give {source} as a {target}。因此，您可以进行管道操作：

Give a的解析器
{source}的解析器
as a的解析器
{target}的解析器

无需用{source}的解析器包装as a的解析器。

Answer 2

编辑：

如评论中所述，干净的解析器不能被本文末尾的先前的解决方案重用。

它导致开发了一个使用Parsec的小型解析器，以处理所有可能的情况，以便最终解析以空格分隔的数字字符串，即

以空格结尾，后跟非必需数字字符，例如“ ..11为”
以空格结尾，例如“ ..11”
以eof结尾，例如“ ..11”

以及以下解析器：

numParser:: (Parser Char->Parser String)->[Char]->Parser [String]
numParser repeatParser digits = 
    let digitParser = repeatParser $ oneOf digits
        endParser = (try $ lookAhead $ (space >> noneOf digits)) <|>
                    (try $ lookAhead $ (space <* eof))           <|> 
                    (eof >> return ' ')
    in do init <- digitParser
          rest <- manyTill (space >> digitParser) endParser
          return (init : rest)

binaryParser和octParser需要进行如下修改：

binaryParser = BinaryQuestion <$> numParser (count 8) "01"
octParser    = OctalQuestion  <$> numParser many1 ['0'..'7']

没什么，需要更改有问题的questionParser，以供参考，我在这里再次声明：

questionParser = do
  string "Give"
  many1 space
  inp <- inputParser 
  many1 space       --no need change to many
  string "as a"
  many1 space     
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

先前的解决方案：

endBy1中的功能many和Text.Parsec在这种情况下很有帮助。

将sepBy1的{{1}}替换为

endBy1

和

binaryParser = BinaryQuestion  <$> count 8 ( oneOf "01") `endBy1` space

与octParser = OctalQuestion <$> (many1 (oneOf ['0'..'7'])) `endBy1` space不同，sepBy1接下来将读取一些字符，以确定是否结束解析，因此，最后一个数字之后的一个空格将被占用，即

endBy1

因此，与其检查“作为...”之前的一个或多个空格，还需要检查零或多个空格，那么为什么要使用Give 164 151 155 145 as a word. ^ this space will be consumed函数而不是{{1 }}，现在代码变为：

many

使用需要以特定单词结尾的parsec来解析字符串？

2 个答案: