逃避单个角色最简单的算法是什么?

时间:2012-06-14 13:01:06

标签: algorithm language-agnostic escaping

我正在尝试使用以下属性编写两个函数escape(text, delimiter)unescape(text, delimiter)

  1. escape的结果不包含delimiter

  2. unescapeescape相反,即

    unescape(escape(text, delimiter), delimiter) == text
    

    表示textdelimiter

  3. 的所有值

    可以限制delimiter的允许值。


    背景:我想创建一个以分隔符分隔的值字符串。为了能够再次从字符串中提取相同的列表,我必须确保单独的字符串不包含分隔符。


    我尝试了什么:我提出了一个简单的解决方案(伪代码):

    escape(text, delimiter):   return text.Replace("\", "\\").Replace(delimiter, "\d")
    unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\\", "\")
    

    但发现测试字符"\d<delimiter>"上的属性2失败。目前,我有以下工作解决方案

    escape(text, delimiter):   return text.Replace("\", "\b").Replace(delimiter, "\d")
    unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\b", "\")
    

    似乎有用,只要delimiter不是\bd(这很好,我不想将它们用作分隔符无论如何)。但是,由于我还没有正式证明其正确性,我担心我错过了其中一个属性被违反的情况。由于这是一个常见的问题,我认为已经存在一个“众所周知的证明正确”的算法,因此我的问题(见标题)。

2 个答案:

答案 0 :(得分:4)

您的第一个算法是正确的。

错误发生在unescape()的实现中:您需要在\ddelimiter之间将\\替换为\,在同一传递中替换 即可。 你不能像这样使用多次调用Replace()。

以下是一些示例C#代码,用于安全引用分隔符分隔的字符串:

    static string QuoteSeparator(string str,
        char separator, char quoteChar, char otherChar) // "~" -> "~~"     ";" -> "~s"
    {
        var sb = new StringBuilder(str.Length);
        foreach (char c in str)
        {
            if (c == quoteChar)
            {
                sb.Append(quoteChar);
                sb.Append(quoteChar);
            }
            else if (c == separator)
            {
                sb.Append(quoteChar);
                sb.Append(otherChar);
            }
            else
            {
                sb.Append(c);
            }
        }
        return sb.ToString(); // no separator in the result -> Join/Split is safe
    }
    static string UnquoteSeparator(string str,
        char separator, char quoteChar, char otherChar) // "~~" -> "~"     "~s" -> ";"
    {
        var sb = new StringBuilder(str.Length);
        bool isQuoted = false;
        foreach (char c in str)
        {
            if (isQuoted)
            {
                if (c == otherChar)
                    sb.Append(separator);
                else
                    sb.Append(c);
                isQuoted = false;
            }
            else
            {
                if (c == quoteChar)
                    isQuoted = true;
                else
                    sb.Append(c);
            }
        }
        if (isQuoted)
            throw new ArgumentException("input string is not correctly quoted");
        return sb.ToString(); // ";" are restored
    }

    /// <summary>
    /// Encodes the given strings as a single string.
    /// </summary>
    /// <param name="input">The strings.</param>
    /// <param name="separator">The separator.</param>
    /// <param name="quoteChar">The quote char.</param>
    /// <param name="otherChar">The other char.</param>
    /// <returns></returns>
    public static string QuoteAndJoin(this IEnumerable<string> input,
        char separator = ';', char quoteChar = '~', char otherChar = 's')
    {
        CommonHelper.CheckNullReference(input, "input");
        if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
            throw new ArgumentException("cannot quote: ambiguous format");
        return string.Join(new string(separator, 1), (from str in input select QuoteSeparator(str, separator, quoteChar, otherChar)).ToArray());
    }

    /// <summary>
    /// Decodes the strings encoded in a single string.
    /// </summary>
    /// <param name="encoded">The encoded.</param>
    /// <param name="separator">The separator.</param>
    /// <param name="quoteChar">The quote char.</param>
    /// <param name="otherChar">The other char.</param>
    /// <returns></returns>
    public static IEnumerable<string> SplitAndUnquote(this string encoded,
        char separator = ';', char quoteChar = '~', char otherChar = 's')
    {
        CommonHelper.CheckNullReference(encoded, "encoded");
        if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
            throw new ArgumentException("cannot unquote: ambiguous format");
        return from s in encoded.Split(separator) select UnquoteSeparator(s, separator, quoteChar, otherChar);
    }

答案 1 :(得分:0)

当分隔符 \bd开头时,也许您可​​以替换此案例。在unescape算法中使用相同的替代替换