Question

我有一个像

这样的字符串

string Text = "012345678901234567890123456789";

和List<int>索引

List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };

有以下限制

列表中有重复项
列表未排序
可能有索引＆gt; Text.length

从文本中删除索引列表中字符的最佳方法是什么？

预期产出：

035681234679012456789

是否有比

更有效的方式

foreach (int index in Indexes
                        .OrderByDescending(x => x)
                        .Distinct()
                        .Where(x => x < Text.Length))
{
    Text = Text.Remove(index, 1);
}

更新：以下是当前答案的基准（string有100.000个字符，List<int>长度为10.000：

Gallant: 3.322 ticks
Tim Schmelter: 8.602.576 ticks
Sergei Zinovyev: 9.002 ticks
rbaghbanli: 7.137 ticks
Jirí Tesil Tesarík: 72.580 ticks

Answer 1

这里或多或少是优雅的LINQ方式：

Text = new string(Text.Where((c, index) => !Indexes.Contains(index)).ToArray());

它使用Enumerable.Where的重载来投射序列中项目的索引。

如果您想要最高效且不易阅读且文字非常大，则可以使用HashSet<int>代替不允许重复的列表和StringBuilder来创建新字符串：

var indexSet = new HashSet<int>(Indexes); // either create from the list(as shown here) or use it without your list
var textBuilder = new StringBuilder(Text.Length);

for(int i = 0; i < Text.Length; i++)
    if (!indexSet.Contains(i))
        textBuilder.Append(Text[i]);
Text = textBuilder.ToString();

当然，您也可以使用LINQ方法中的HashSet<int>来提高效率。

Answer 2

这会更快地运作：

string Text = "012345678901234567890123456789";
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };

HashSet<int> hashSet = new HashSet<int>(Indexes);

StringBuilder sb = new StringBuilder(Text.Length);
for (int i = 0; i < Text.Length; ++i)
{
    if (!hashSet.Contains(i))
    {
        sb.Append(Text[i]);
    }
}

string str = sb.ToString();

Answer 3

是的，请参阅下面的代码（它将在每个序列上只迭代一次）：

update Table1 set Credit_number = SUBSTRING(Credit_number, 1, 4)+'**********'

Answer 4

以下假设您的字符串包含一组已知字符。如果您确定知道，例如，Unicode字符￰永远不会出现在字符串中，您可以将其用作占位符来标记要删除的字符。这应该非常快，以换取这个限制：

char temp = '\uFFF0';
StringBuilder sb = new StringBuilder(Text);
for (int i = 0; i < Indexes.Count; i++)
{
    if (Indexes[i] < sb.Length)
    {
        sb[Indexes[i]] = temp;
    }
}

Text = sb.Replace(temp.ToString(), null).ToString();

这似乎比构建HashSet快3-4倍，就像其他一些答案所建议的那样。 http://ideone.com/mUILHg

如果你不能做出上述假设，你可以构建一个数组来包含这些额外的数据，而不是使用唯一的字符。这会进行两轮迭代（所以它有点慢），但它仍然是O（n）效率（因此它通常应该比在迭代之前将索引放入散列图更快）。

bool[] exclude = new bool[Text.Length];
for (int i = 0; i < Indexes.Count; i++)
{
    if (Indexes[i] < exclude.Length)
    {
        exclude[Indexes[i]] = true;
    }
}
StringBuilder sb = new StringBuilder(Text.Length);
for (int i = 0; i < Text.Length; i++)
{
    if (!exclude[i])
    {
        sb.Append(Text[i]);
    }
}
Text = sb.ToString();

快速基准：http://ideone.com/3d2uPH

Answer 5

使用byte（可能被boolean替换）而不是hash表的修改后的解决方案。 PROS：线性复杂度，CONS：标志数组需要额外的内存。

string Text = "012345678901234567890123456789";
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };
byte[] contains = new byte[Text.Length];
Indexes.ForEach(p=> {if ( p<Text.Length) contains[p]=1;});
var output = string.Concat(Enumerable.Range(0, Text.Length).Where(p => contains[p] != 1).Select(p => Text[p]));

从字符串中删除字符

5 个答案: