清理字符串:用单个分隔符替换连续的非孤立字符

时间:2016-11-28 13:06:16

标签: c# string linq

我有一个字符串,需要格式化:

  • 保留字母数字字母
  • 使用单个分隔符替换一个或多个非aplhanum字符

我想出了这个:

string Format( string str , string separator )
{
    if( string.IsNullOrEmpty( str ) )
        return string.Empty;

    var words = new List<string>();
    var sb = new StringBuilder();

    foreach( var c in str.ToCharArray() )
    {
        if( char.IsLetterOrDigit( c ) )
        {
            sb.Append( c );
        }
        else if( sb.Length > 0 )
        {
            words.Add( sb.ToString() );
            sb.Clear();
        }
    }

    if( sb.Any() )
        words.Add( sb.ToString() );

    return string.Join( seperator , words );
}

是否有比这更好/更多linq的/更短/更高性能的解决方案(没有使用正则表达式)?

3 个答案:

答案 0 :(得分:2)

您可以转到“低级别”并使用字符串为IEnumerable<char>的事实来使用它GetEnumerator

string Format(string str, string separator)
{
    var builder = new StringBuilder (str.Length);

    using (var e = str.GetEnumerator ())
    {
        while (e.MoveNext ())
        {
            bool hasMoved = true;

            if (!char.IsLetterOrDigit (e.Current))
            {
                while ((hasMoved = e.MoveNext ()) && !char.IsLetterOrDigit (e.Current))
                    ;
                builder.Append (separator);
            }

            if (hasMoved)
                builder.Append (e.Current);
        }
    }

    return builder.ToString ();
}

以防这是一个正则表达式版本

private static readonly Regex rgx = new Regex(@"[^\w-[_]]+", RegexOptions.Compiled);

string Format (string str, string separator)
{
    return rgx.Replace (str, separator);
}

关于OP关于linq one-liner的评论的附录:
这是可能的,但很难“易于理解”

使用匿名类型

string Format (string str, string separator)
{
    return str.Aggregate (new { builder = new StringBuilder (str.Length), prevDiscarded = false }, (state, ch) => char.IsLetterOrDigit (ch) ? new { builder = (state.prevDiscarded ? state.builder.Append (separator) : state.builder).Append (ch), prevDiscarded = false } : new { state.builder, prevDiscarded = true }, state => (state.prevDiscarded ? state.builder.Append (separator) : state.builder).ToString ());
}

使用元组代替

string Format (string str, string separator)
{
    return str.Aggregate (Tuple.Create (new StringBuilder (str.Length), false), (state, ch) => char.IsLetterOrDigit (ch) ? Tuple.Create ((state.Item2 ? state.Item1.Append (separator) : state.Item1).Append (ch), false) : Tuple.Create (state.Item1, true), state => (state.Item2 ? state.Item1.Append (separator) : state.Item1).ToString ());
}

和Tuple一起,我们可以帮助他们“轻松”(可以说)可读性[虽然技术上不再是单行内容]

//top of file
using State = System.Tuple<System.Text.StringBuilder, bool>;

string Format (string str, string separator)
{
    var initialState = Tuple.Create (new StringBuilder (str.Length), false);

    Func<State, StringBuilder> addSeparatorIfPrevDiscarded = state => state.Item2 ? state.Item1.Append (separator) : state.Item1;
    Func<State, char, State> aggregator = (state, ch) => char.IsLetterOrDigit (ch) ? Tuple.Create (addSeparatorIfPrevDiscarded (state).Append (ch), false) : Tuple.Create (state.Item1, true);
    Func<State, string> resultSelector = state => addSeparatorIfPrevDiscarded (state).ToString ();

    return str.Aggregate (initialState, aggregator, resultSelector);
}

让它变得复杂的是,当“项目输出”依赖于同一集合中的前一个(或下一个)项目时,(IMO)Linq *不太适合。 * Linq没有问题,但是很快就会出现很多噪音,包括Func和匿名类型/元组语法(可能C#7.0会稍微改变一下)

在相同的味道中,人们也可以接受只允许bool作为状态的副作用

string Format (string str, string separator)
{
    var builder = new StringBuilder (str.Length);

    Action<bool> addSeparatorIfPrevDiscarded = prevDiscarded => { if (prevDiscarded) builder.Append (separator); };
    Func<bool, char, bool> aggregator = (prevDiscarded, ch) => {
        if (char.IsLetterOrDigit (ch)) {
            addSeparatorIfPrevDiscarded (prevDiscarded);
            builder.Append (ch);
            return false;
        }

        return true;
    };

    addSeparatorIfPrevDiscarded (str.Aggregate (false, aggregator));

    return builder.ToString ();
}

答案 1 :(得分:1)

这样的内容可以避免使用List<string>和使用string.Join。它也会编译。

string Format(string str, char seperator)
{
    if (string.IsNullOrEmpty(str))
        return string.Empty;

    var sb = new StringBuilder();
    bool previousWasNonAlphaNum = false;

    foreach (var c in str)
    {
        if (char.IsLetterOrDigit(c))
        {
            if (previousWasNonAlphaNum && sb.Count > 0)
                sb.Append(seperator);
            sb.Append(c);
        }

        previousWasNonAlphaNum = !char.IsLetterOrDigit(c);
    }

    return sb.ToString();
}

答案 2 :(得分:0)

试试这个,它会起作用

    string Format(string str, string separator)
    {
        var delimiter = char.Parse(separator);
        var replaced = false;
        var cArray = str.Select(c =>
        {                
            if (!char.IsLetterOrDigit(c) & !replaced)
            {
                replaced = true;
                return delimiter;
            }
            else if (char.IsLetterOrDigit(c))
            {
                replaced = false;                    
            }
            else
            {
                return ' ';
            }
            return c;

        }).ToArray();

        return new string(cArray).Replace(" ","");
    }

或者您可以尝试以下

   string Format(string str, string separator)
    {
        var delimiter = char.Parse(separator);
        var cArray = str.Select(c => !char.IsLetterOrDigit(c) ? delimiter : c).ToArray();
        var wlist = new string(cArray).Split(new string[]{separator}, StringSplitOptions.RemoveEmptyEntries);
        return string.Join(separator, wlist);
    }