使用特殊情况

时间:2016-07-13 08:12:46

标签: c# string split

我有一个字符串,长度限制为140个字符。 通常,我的代码中有超过140个。 字符串是以这种格式设置的值:Mxxxx其中x可以是任何数字,并且它没有严格的长度。所以我可以拥有M1或者我也可以拥有M281。

如果字符串超过140个字符,我想先取140,但如果最后一个字母打破了一半,我根本不想把它放在我的字符串中。

尽管如此,我还需要在局部变量中保存下半部分。

例如,假设这是字符串

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619"

让我们说这是前140个字符:

"M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M69"

最后一个值为M6919,但已分为M6919

最有效的说法是:如果它超过140则拆分,但如果新字符串中的最后一个值被吐出,则将其从字符串的第一部分移除并将其放入其他字符串值中其余的原始字符串。

可能有很多方法可以实现这一目标。我可以使用if或switch / case循环并说出第二个字符串的第一个字母不是' M',而不是我知道该值被拆分我应该从第一个字符串中删除它,但是有人有更清洁解决方案比那个?

private static string CreateSettlmentStringsForUnstructuredField(string settlementsString)
{
    string returnSettlementsString = settlementsString.Replace(", ", " ");

    if (returnSettlementsString.Length > 140)
    {
        returnSettlementsString.Substring(0, 140);
        /*if returnSettlementsString was spitted in two in a way 
          that last value was broken in two parts, take that value 
          out of returnSettlementStrings and put it in some new 
          string value with the other half of the string.*/
    }
    return returnSettlementsString;
} 

8 个答案:

答案 0 :(得分:2)

这样的事情可能有用:

string result;
if (input.Length > 140)
{
    result = new string(input.Take(140).ToArray());
    if (input[140] != ',') // will ensure that we don´t omit the last complete word if the 140eth character is a comma
        result = result.Substring(0, result.LastIndexOf(','));
} 
else result = input;

如果总长度更长,它只需要前140个字符。然后它搜索逗号的最后一个索引并获取所有字符,直到这个逗号。

答案 1 :(得分:1)

最好的办法是将字符串拆分为“单词”,然后使用字符串生成器重新组合它们。未经测试的原始代码看起来像;

public IEnumerable<string> SplitSettlementStrings(string settlementsString) 
{
    var sb = new StringBuilder();
    foreach(var word in WordsFrom(settlementsString))
    {
        var extraFragment = $"{word}, ";
        if (sb.Length + extraFragment < 140) {
        sb.Append(extraFragment);
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        yield return sb.ToString();
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        yield return sb.ToString();
    }
}

你需要使用类似的东西来分割单词;

 public IEnumerable<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    return settlementsString.split(',').Select(x => x.Trim()).Where(x => x.Length > 0);
 }

你会像这样使用整体;

 var settlementStringsIn140CharLenghts = SplitSettlementStrings("M234, M456, M452 ...").ToArray()

修改

old-skool .net版本看起来像这样;

public ICollection<string> SplitSettlementStrings(string settlementsString) 
{
    List<string> results = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(string word in WordsFrom(settlementsString))
    {
        string extraFragment = word + ", ";
        if (sb.Length + extraFragment < 140) {
           sb.Append(extraFragment);
        }
    }
    else
    {
        // we'd overflow the 140 char limit, so return this fragment and continue;
        results.Add(sb.ToString());
        sb = new StringBuilder();
    }

    if (sb.Length > 0) {
        // we may have content left in the string builder
        resuls.Add(sb.ToString());
    }
}

 public ICollection<string> WordsFrom(string settlementsString) 
 {
    // split on commas, then trim to remove whitespace;
    string[] fragments = settlementsString.split(',');
    List<string> result = new List<string>();
    foreach(string fragment in fragments) 
    {
        var candidate = fragment.Trim();
        if (candidate.Length > 0) 
        {
            result.Add(candidate);
        }
    } 
    return result;
 }

答案 2 :(得分:0)

这样的事情应该有效:

string test = "M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619";

if (test.Length > 140)
    if (test[140] != ',' && test[140] != ' ') // Last entry was split?
        test = test.Substring(0, test.LastIndexOf(',', 139)); // Take up to but not including the last ','
    else
        test = test.Substring(0, 139);

Console.WriteLine(test);

答案 3 :(得分:0)

我的看法,只是为了好玩:

var ssplit = theString.Replace(", ", "#").Split('#');       
var sb = new StringBuilder();
for(int i = 0; i < ssplit.Length; i++)
{
    if(sb.Length + ssplit[i].Length > 138) // 140 minus the ", "
        break;
    if(sb.Length > 0) sb.Append(", ");
    sb.Append(ssplit[i]);
}

这里我将字符串拆分为Mxxx部分。然后我遍历这些部分,直到下一部分溢出140(或138,因为它需要在计数中包含", "分隔符)

See it in action

答案 4 :(得分:0)

如果您不想将字符串拆分为列表,我会执行以下操作:

string myString = "M19, M42........";
string result;
int index = 141;

do
{
    //Decrement index to reduce the substring size
    index--;

    //Make the result the new length substring
    result = myString.Substring(0, index);

}while (myString[index] != ','); //Check if our result contains a comma as the next char to check if we're at the end of an entry

所以你基本上只是将原始字符串子串到140,检查位置141处的字符是否是逗号,表示“干净”剪切。如果没有,它将在139处子串,检查140是否有逗号,等等。

答案 5 :(得分:0)

这是一个解决方案。它从第141个字符开始向后处理字符串。

public static string Normalize(string input, int length)
{
    var terminators = new[] { ',', ' ' };
    if (input.Length <= length + 1)
        return input;

    int i = length + 1;
    while (!terminators.Contains(input[i]) && i > 0)
        i = i - 1;

    return input.Substring(0, i).TrimEnd(' ', ',');
}

Normalize(settlementsString, 140);

答案 6 :(得分:0)

由于新字符串的持续内存分配,可能不是性能最敏感的解决方案,但它确实听起来像某种类型的一次性原始数据输入。我们可以选择从输入中删除“令牌”,而我们有超过140个字符:

const string separator = ", ";

while (input.Length > 140)
{
     int delStartIndex = input.LastIndexOf(separator);
     int delLength = input.Length - delStartIndex;

     input = input.Remove(delStartIndex, delLength);
}

更加注重绩效的方法是为子字符串创建IEnumerable<string>string[]形式,并在加入之前计算其总长度。有点像这样:

const string separator = ", ";
var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

var length = splitInput[0].Length;
var targetIndex = 1;

for (targetIndex = 1; length <= 140; targetIndex++)
    length += separator.Length + splitInput[targetIndex].Length;

if (length > 140)
    targetIndex--;

var splitOutput = new string[targetIndex];
Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

var output = string.Join(separator, splitOutput);

我们甚至可以制作一个很好的扩展方法:

public static class StringUtils
{
    public static string TrimToLength(this string input, string separator, int targetLength)
    {
        var splitInput = input.Split(separator.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

        var length = splitInput[0].Length;
        var targetIndex = 1;

        for (targetIndex = 1; length <= targetLength; targetIndex++)
            length += separator.Length + splitInput[targetIndex].Length;

        if (length > targetLength)
            targetIndex--;

        var splitOutput = new string[targetIndex];
        Array.Copy(splitInput, 0, splitOutput, 0, targetIndex);

        return string.Join(separator, splitOutput);
    }
}

并称之为:

input.TrimToLength(", ", 140);

或:

input.TrimToLength(separator: ", ", targetLength:140);

答案 7 :(得分:0)

我用这个:

static string FirstN(string s, int n = 140)
{
    if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
    while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
    return s.Substring(0, n);
}

工作测试样本代码(带注释输出):

using System;
namespace ConsoleApplication1
{
    class Program
    {
        static string FirstN(string s, int n = 140)
        {
            if (string.IsNullOrEmpty(s) || s.Length <= n) return s;
            while (n > 0 && s[n] != ' ' && s[n] != ',') n--;
            return s.Substring(0, n);
        }
        static void Main(string[] args)
        {
            var s = FirstN("M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M669, M6753, M6919, M69, M6753, M6919, M6169, M63, M6919, M6169, M6753, M6919, M619, M653, M6919, M66, M6753, M19, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M619");

            Console.WriteLine(s.Length); // 136
            Console.WriteLine(s);  //M5903, M6169, M6753, M619, M6169, M6753, M6919, M6169, M6753, M919, M6169, M6753, M6919, M6169, M6753, M6919, M6169, M6753, M919, M6169,
        }
    }
}

我希望这会有所帮助。