拆分平衡字符串C#的更有效方法

时间:2012-04-11 17:28:36

标签: c# string string-parsing

我之前用平衡正则表达式做了这个,当时我只有一个平衡字符......但是有更多的平衡字符会变得更加复杂和丑陋。

对于我目前的目的,我改为通过对字符串进行标记来编写一个方法,但它非常慢(而且效率极低)最昂贵的部分似乎是我正在做的无偿的子串使用(是的,我知道它很糟糕) )。

基本上,我想采取以下

  hello("(abc d)", efg (hijk)) and,some more<%lmn, "o(\")pq", (xy(z))%>

并以

结束
hello("(abc d)", efg (hijk)) 
[space] (the actual character)
and
,
some more
<%lmn, "o()pq", (xy(z))%>

换句话说,我正在分裂(我希望这些包括在数组结果中)

[space]
,

....我有“平衡分组字符串”

" "
( )
<% %> 

...我有转义字符

\

我不想为此目的编写一个完整的大解析器...

以下是代码:

    public static IEnumerable<string> SplitNotEnclosed(this string s, IEnumerable<string> separators, Dictionary<string, string> enclosingValues = null, IEnumerable<char> escapeCharacters = null, bool includeSeparators = false, StringComparison comparisonType = StringComparison.Ordinal)
    {
        var results = new List<string>();

        var enclosureStack = new Stack<KeyValuePair<string, string>>();
        bool atEscapedCharacter = false;

        if (escapeCharacters == null) escapeCharacters = new[] { '\\' };
        if (enclosingValues == null) enclosingValues = new[] { "\"" }.ToDictionary(i => i);

        var orderedEnclosingValues = enclosingValues.OrderByDescending(i => i.Key.Length).ToArray();
        separators = separators.OrderByDescending(v => v.Length).ToArray();

        var currentPart = new StringBuilder();

        while (s.Length > 0)
        {
            int addToIndex = 0;

            var newEnclosingValue = orderedEnclosingValues.FirstOrDefault(v => s.StartsWith(v.Key, comparisonType));

            if (enclosureStack.Count > 0 && !atEscapedCharacter && s.StartsWith(enclosureStack.Peek().Value))
            {
                addToIndex = enclosureStack.Peek().Value.Length;
                enclosureStack.Pop();
            }
            else if (newEnclosingValue.Key != null && !atEscapedCharacter)
            {
                enclosureStack.Push(newEnclosingValue);
                addToIndex = newEnclosingValue.Key.Length;
            }
            else if (escapeCharacters.Contains(s[0]) && enclosureStack.Count > 0)
            {
                atEscapedCharacter = !atEscapedCharacter;
                addToIndex = 1;
            }
            else if (enclosureStack.Count > 0)
            {
                atEscapedCharacter = false;
                addToIndex = 1;
            }

            if (enclosureStack.Count == 0)
            {
                string separator = separators.FirstOrDefault(v => s.StartsWith(v, comparisonType));

                if (separator != null)
                {
                    if (currentPart.Length > 0) results.Add(currentPart.ToString());
                    results.Add(separator);
                    s = s.Substring(separator.Length);
                    currentPart = new StringBuilder();

                    addToIndex = 0;
                }
                else
                {
                    addToIndex = 1;
                }
            }

            currentPart.Append(s.Substring(0, addToIndex));
            s = s.Substring(addToIndex);
        }

        if (currentPart.Length > 0) results.Add(currentPart.ToString());

        if (!includeSeparators)
        {
            results = results.Except(separators).ToList();
        }

        return results.ToArray();
    }

0 个答案:

没有答案