Question

我为Nom编写RCS解析器。 RCS文件往往采用ISO-8859-1编码。其中一个语法产生是String。这是@ - 分隔符，文字@符号将转义为@@。

@A String@ -> A String
@A @@ String@ -> A @ String

我有一个工作功能（见结束）。 IResult来自Nom，您可以返回已解析的内容，以及其他未解析的输入，也可以返回Error / Incomplete。如果不需要取消转义，则Cow用于返回在原始输入切片上构建的引用，如果不是，则使用拥有的字符串。

是否有任何内置的Nom宏可以帮助解析？

#[macro_use]
extern crate nom;
use std::str;
use std::borrow::Cow;
use nom::*;

/// Parse an RCS String
fn string<'a>(input: &'a[u8]) -> IResult<&'a[u8], Cow<'a, str>> {
    let len = input.len();
    if len < 1 {
        return IResult::Incomplete(Needed::Unknown);
    }
    if input[0] != b'@' {
        return IResult::Error(Err::Code(ErrorKind::Custom(0)));
    }
    // start of current chunk. Chunk is a piece of unescaped input
    let mut start = 1;
    // current char index in input
    let mut i = start;
    // FIXME only need to allocate if input turned out to need unescaping
    let mut s: String = String::new();
    // Was the input escaped?
    let mut escaped = false;
    while i < len {
        // Check for end delimiter
        if input[i] == b'@' {
            // if there's another @ then it is an escape sequence
            if i + 1 < len && input[i + 1] == b'@' {
                // escaped @
                i += 1; // want to include the first @ in the output
                s.push_str(str::from_utf8(&input[start .. i]).unwrap());
                start = i + 1;
                escaped = true;
            } else {
                // end of string
                let result = if escaped {
                    s.push_str(str::from_utf8(&input[start .. i]).unwrap());
                    Cow::Owned(s)
                } else {
                    Cow::Borrowed(str::from_utf8(&input[1 .. i]).unwrap())
                };
                return IResult::Done(&input[i + 1 ..], result);
            }
        }
        i += 1;
    }
    IResult::Incomplete(Needed::Unknown)
}

Answer 1

看起来使用nom库的方法是使用宏组合器。快速浏览source code会提供一些nice examples个解析器，包括解析带有转义字符的字符串。这就是我想出的：

Vec::extend

正如您所看到的，我只是使用Cow将字节复制到向量中 - 如果您愿意，可以在此处更复杂并返回escaped!字节切片。

遗憾的是，{{1}}宏在这种情况下似乎没有用，因为当终结符与转义字符相同时它似乎无法工作（这实际上是一种非常常见的情况））。

Nom Parser到Unescape String

1 个答案: