如何拆分字符串然后重新加入?

时间:2012-05-09 22:59:29

标签: c# string join split

我在C#中有以下字符串。

"aaa,bbbb.ccc|dddd:eee"

然后我将其与new char[] {',','.','|',':'}分开。如何使用相同的字符以与以前相同的顺序重新加入此字符串?因此,列表最终会与以前完全相同。

实施例

string s = "aaa,bbbb.ccc|dddd:eee";
string[] s2 = s.Split(new char[] {',','.','|',':'});
// now s2 = {"aaa", "bbbb", "ccc", "dddd", "eee"}
// lets assume I done some operation, and
// now s2 = {"xxx", "yyy", "zzz", "1111", "222"}

s = s2.MagicJoin(~~~~~~);  // I need this

// now s = "xxx,yyy.zzz|1111:222";

修改

上面示例中的char[]只是样本,不是同一个顺序,甚至不会在现实世界中同时出现。

修改

只是一个想法,如何使用Regex.split,然后首先按char[]分割得到string[],然后使用not the char[]拆分另一个string[],稍后只需把它们放回去。也许工作,但我不知道如何编码。

4 个答案:

答案 0 :(得分:3)

这里你去 - 它以任何顺序组合分隔符的任何组合,也允许在字符串中实际找不到分隔符的情况。我花了一段时间来提出这个问题,发布后,看起来比任何其他答案都复杂了!

好吧,无论如何我会把它放在这里。

public static string SplitAndReJoin(string str, char[] delimiters, 
  Func<string[], string[]> mutator)
{
  //first thing to know is which of the delimiters are 
  //actually in the string, and in what order
  //Using ToArray() here to get the total count of found delimiters
  var delimitersInOrder = (from ci in
                            (from c in delimiters
                             from i in FindIndexesOfAll(str, c)
                             select new { c, i })
                          orderby ci.i
                          select ci.c).ToArray();
  if (delimitersInOrder.Length == 0)
    return str;

  //now split and mutate the string
  string[] strings = str.Split(delimiters);
  strings = mutator(strings);
  //now build a format string
  //note - this operation is much more complicated if you wish to use 
  //StringSplitOptions.RemoveEmptyEntries
  string formatStr = string.Join("",
    delimitersInOrder.Select((c, i) => string.Format("{{{0}}}", i)
      + c));
  //deals with the 'perfect' split - i.e. there's always two values
  //either side of a delimiter
  if (strings.Length > delimitersInOrder.Length)
    formatStr += string.Format("{{{0}}}", strings.Length - 1);

  return string.Format(formatStr, strings);
}

public static IEnumerable<int> FindIndexesOfAll(string str, char c)
{
  int startIndex = 0;
  int lastIndex = -1;

  while(true)
  {
    lastIndex = str.IndexOf(c, startIndex);
    if (lastIndex != -1)
    {
      yield return lastIndex;
      startIndex = lastIndex + 1;
    }
    else
      yield break;
  }
}

这是一个可以用来验证它的测试:

[TestMethod]
public void TestSplitAndReJoin()
{
  //note - mutator does nothing
  Assert.AreEqual("a,b", SplitAndReJoin("a,b", ",".ToCharArray(), s => s));
  //insert a 'z' in front of every sub string.
  Assert.AreEqual("zaaa,zbbbb.zccc|zdddd:zeee", SplitAndReJoin("aaa,bbbb.ccc|dddd:eee",
    ",.|:".ToCharArray(), s => s.Select(ss => "z" + ss).ToArray()));
  //re-ordering of delimiters + mutate
  Assert.AreEqual("zaaa,zbbbb.zccc|zdddd:zeee", SplitAndReJoin("aaa,bbbb.ccc|dddd:eee",
    ":|.,".ToCharArray(), s => s.Select(ss => "z" + ss).ToArray()));
  //now how about leading or trailing results?
  Assert.AreEqual("a,", SplitAndReJoin("a,", ",".ToCharArray(), s => s));
  Assert.AreEqual(",b", SplitAndReJoin(",b", ",".ToCharArray(), s => s));
}

请注意,我假设您需要能够对数组的元素执行某些操作,以便在将各个字符串重新连接在一起之前对其进行操作 - 否则您可能会保留原始字符串!

该方法构建动态格式字符串。此处不保证效率:)

答案 1 :(得分:3)

这是MagicSplit

public IEnumerable<Tuple<string,char>> MagicSplit(string input, char[] split)
{    
    var buffer = new StringBuilder();
    foreach (var c in input)
    {
        if (split.Contains(c)) 
        {
            var result = buffer.ToString();
            buffer.Clear();
            yield return Tuple.Create(result,c);
        }
        else
        {
            buffer.Append(c);
        }
    }
    yield return Tuple.Create(buffer.ToString(),' ');
}

两种MagicJoin

public string MagicJoin(IEnumerable<Tuple<string,char>> split)
{
    return split.Aggregate(new StringBuilder(), (sb, tup) => sb.Append(tup.Item1).Append(tup.Item2)).ToString();
}

public string MagicJoin(IEnumerable<string> strings, IEnumerable<char> chars)
{
    return strings.Zip(chars, (s,c) => s + c.ToString()).Aggregate(new StringBuilder(), (sb, s) => sb.Append(s)).ToString();
}

用途:

var s = "aaa,bbbb.ccc|dddd:eee";

// simple
var split = MagicSplit(s, new char[] {',','.','|',':'}).ToArray();
var joined = MagicJoin(split);    

// if you want to change the strings
var strings = split.Select(tup => tup.Item1).ToArray();
var chars = split.Select(tup => tup.Item2).ToArray();
strings[0] = "test";
var joined = MagicJoin(strings,chars);

答案 2 :(得分:3)

使用Regex类可能更容易:

input = Regex.Replace(input, @"[^,.|:]+", DoSomething);

DoSomething是一种方法或lambda,用于转换有问题的项目,例如:

string DoSomething(Match m)
{
    return m.Value.ToUpper();
}

对于此示例,“aaa,bbbb.ccc | dddd:eee”的输出字符串将为“AAA,BBBB.CCC | DDDD:EEE”。

如果你使用lambda,你可以很容易地保持状态,如下所示:

int i = 0;
Console.WriteLine(Regex.Replace("aaa,bbbb.ccc|dddd:eee", @"[^,.|:]+",
    _ => (++i).ToString()));

输出:

1,2.3|4:5

这取决于你对这些项目所做的转变。

答案 3 :(得分:1)

这个怎么样?


var x = "aaa,bbbb.ccc|dddd:eee";

var matches = Regex.Matches(x, "(?<Value>[^\\.,|\\:]+)(?<Separator>[\\.,|\\:]?)");

var result = new StringBuilder();

foreach (Match match in matches)
{
    result.AppendFormat("{0}{1}", match.Groups["Value"], match.Groups["Separator"]);
}

Console.WriteLine(result.ToString());
Console.ReadLine();

或者如果你喜欢LINQ(我这样做):


var x = "aaa,bbbb.ccc|dddd:eee";
var matches = Regex.Matches(x, "(?<Value>[^\\.,|\\:]+)(?<Separator>[\\.,|\\:]?)");
var reassembly = matches.Cast<Match>().Aggregate(new StringBuilder(), (a, v) => a.AppendFormat("{0}{1}", v.Groups["Value"], v.Groups["Separator"])).ToString();
Console.WriteLine(reassembly);
Console.ReadLine();

毋庸置疑,你可以在重新组装之前对零件做些什么,我认为这是本练习的重点