C#单个Datatable列数据的高效重复数据删除

时间:2017-03-15 09:16:55

标签: c#

我有一个带有一些数据的数据表,如下所示,需要通过附加[1],[2]等来删除名称字段中的任何名称。

下面的当前代码有效但在大型表格上速度很慢。 有关在C#中以最有效的方式做任何提示将不胜感激。

当前表格样本:

- ID	Name		X	Y
- 1	John		45	66
- 2	Paul		44	66
- 3	George		88	102
- 4	John		33	90
- 5	John		53	37
- 6	Paul		97	65
- 7	Ringo		01	87
- 8	Ringo		76	65​

必填表格样本:

- ID	Name		X	Y
- 1	John[1]		45	66
- 2	Paul[1]		44	66
- 3	George		88	102
- 4	John[2]		33	90
- 5	John[3]		53	37
- 6	Paul[2]		97	65
- 7	Ringo[1]	01	87
- 8	Ringo[2]	76	65​

以下当前代码:

foreach (DataRow aRow in ds.Tables[0].Rows) // run through all
   {
   string aName = aRow["Name"].ToString();                
   DataRow[] FoundRows = ds.Tables[0].Select("Name = '" + aName +"'"); // Find all rows with same name

    if (FoundRows.Length > 1) // As will always find itself
       {
          int i = 1;
          foreach (DataRow row in FoundRows)
            {
             row["Name"] = row["Name"].ToString() + "[" + i + "]";
              i++;
            }
        ds.Tables[0].AcceptChanges(); // Ensure the rows are updated before looping around.
        }
    }

2 个答案:

答案 0 :(得分:1)

这是一种方法

DataTable table = new DataTable();
//test data
table.Columns.Add("Name");
table.Columns.Add("X", typeof(int));
table.Rows.Add(new object[] { "john", 10 });
table.Rows.Add(new object[] { "paul", 44 });
table.Rows.Add(new object[] { "ringo", 312 });
table.Rows.Add(new object[] { "george", 30 });
table.Rows.Add(new object[] { "john", 100 });
table.Rows.Add(new object[] { "paul", 443 });

//converting DataTable to enumerable collection of rows and then grouping by name, 
//skipping groups with only one row(such as george or ringo)
var groupedData = table.AsEnumerable().GroupBy(row => row[0].ToString()).Where(g => g.Count() > 1);

//iterate through each group of <string, DataRow>
foreach (var group in groupedData)
{
    int counter = 1; //counter for "[x]" suffix
    //iterate through all rows under one name, eg. John
    foreach (var groupedItem in group)
    {
        //add [x] 
        groupedItem[0] = string.Format("{0} [{1}]", group.Key, counter);
        counter++;
    }
}

编辑:简化代码并使其更有效率,如AdrianWragg所建议的

答案 1 :(得分:0)

可能旧的好for循环在一次通过中更新整个表将是最快的方法:

var foundNames = new Dictionary<string, int>();
for (int rowInd = 0; rowInd < dataTable.Rows.Count; rowInd++)
{
    // If name is not yet found in foundNames, then store its row
    // index. Don't update the dataTable yet -- this is the only
    // occurrence so far.
    // The index is stored inverted to distinguish from count.
    //
    // If name is found in foundNames, retrieve the count.
    // If count is inverted (non-positive), then we've encountered
    // the name second time. In this case update the row with the
    // first occurrence and the current row too. Store the count of 2.
    //
    // If count is positive, then it's third or even later occurrence.
    // Update the current row only and store the incremented count.

    var name = dataTable.Rows[rowInd]["Name"].ToString();
    int count;
    if (!foundNames.TryGetValue(name, out count))
        foundNames.Add(name, -rowInd);
    else
    {
        if (count <= 0)
        {
            dataTable.Rows[-count]["Name"] = name + "[1]";
            count = 1;
        }
        count++;
        dataTable.Rows[rowInd]["Name"] = name + "[" + count + "]";
        foundNames[name] = count;
    }
}