Question

我正在尝试从数据表中删除重复的数据，而不只是保留第一个数据条目，而是继续删除第二个重复的条目。我需要设置一个条件，使其能够删除不正确的条目。

例如：

ID          Value
111          A
222          B
333          C
444          A

我想删除111数据并保留444，因为它们具有重复的数据A。我发现的另一种解决方案将改为删除444。我能找到与我的问题有关的最接近的东西是这个。 Remove Duplicate item from list based on condition

答案是使用我不熟悉的linq。我当时想使用“ StartsWith”来过滤所需的正确数据，但我不知道如何实现。

var result = items
    .GroupBy(item => item.Name)
    .SelectMany(g => g.Count() > 1 ? g.Where(x => x.Price != 500) : g); <-- I want to apply StartsWith here

真的很感谢有人能帮助我解决这个问题。

Answer 1

我认为您需要类似的东西

var result = items
    .GroupBy(item => item.Name)
    .SelectMany(g =>
    {
       if (g.Count() > 1 && g.Key == "A") //g.Key.StartsWith("A")
         return g;
    });

这将返回一个数组，其中将包含所有"A"个元素，然后您可以决定要删除哪个数组

要删除所有重复项并仅保留最后插入的元素：

var result = items
    .GroupBy(item => item.Name)
    .SelectMany(g =>
    {
       if (g.Count() > 1)
       {
          var mainElement = g.OrderByDescending(x => x.ID).First();
          return g.Where(x => x.ID != mainElement.ID).ToArray();
       }
    });

Answer 2

您忘了说为什么要保留444而不是111，而不是相反。

LINQ被开发来查询数据。 LINQ永远不会更改原始源序列。

您可以使用LINQ查询要删除的项目，然后使用foreach逐一删除项目。

查询重复项很容易。如果您更经常需要此功能，请考虑为此创建扩展功能：

static IEnumerable<IGrouping<TSource, TKey>> GetDuplicates<TSource>(
   this IEnumerable<TSource> source,
   Func<TSource, TKey> propertySelector)
{
    // TODO: check source and propertySelector not null

    // make groups of source items that have the same value for property:
    return source.GroupBy(item => propertySelector(item))

        // keep only the groups that have more than one element
        // it would be a waste to Coun(), just stop after counting more than one
        .Where(group => group.Skip(1).Any());
}

这将为您提供所有具有选定属性重复值的源项目的组。

在您的情况下：

var itemsWithDuplicateValues = mySourceItems.GetDuplicates(item => item.Value);

这将为您提供所有具有重复的item.Value值的源项目，并按同一item.Value分组

现在，您已经有时间找出为什么要保留ID 444而不是111的项目了，您可以编写一个函数，该函数接受一组重复项并返回要删除的元素。

static IEnumerable<TSource> SelectItemsIWantToRemove<TSource>(
   IEnumerable<TSource> source)
{
     // TODO: check source not null
     // select the items that you want to remove:
     foreach (var item in source)
     {
         if (I want to remove this item)
           yield return item;
     }
     // TODO: make sure there is always one item that you want to keep
     // or decide what to do if there isn't any item that you want to keep
}

现在，您已经具有选择要删除项目的功能，现在很容易创建一个LINQ，它将从重复序列中选择要删除的项目：

static IEnumerable<TSource> WhereIWantToRemove<TSource>(
   this IEnumerable<IGrouping<TSource>> duplicateGroups)
{
    foreach (var group in duplicateGroups)
    {
        foreach (var sourceItem in group.WhereIWantToRemove())
        {
            yield return sourceItem;
        }
    }
}

您也可以为此使用SelectMany。

现在将所有内容放在一起：

static IEnumerable<TSource> WhereIWantToRemove<TSource, TKey>(
   this IEnumerable<TSource> source,
   Func<TSource, TKey> propertySelector)
{
    return source.GetDuplicates(propertySelector)
        .WhereIWantToRemove();
}

用法：

var itemsToRemove = mySourceItems.WhereIWantToRemove(item => item.Value);

您可以看到我选择创建几个相当小且易于理解的扩展功能。当然，您可以将它们全部组合在一个大型LINQ语句中。但是，我不确定您是否可以说服项目负责人，这将使您的代码更具可读性，可测试性，可维护性和可重用性。因此，我的建议是坚持使用小的扩展功能。

Answer 3

您可以按值对DataRow进行分组，然后选择不符合您条件的所有行，然后删除所有这些行：

var result = items.AsEnumerable()
                  .GroupBy(item => item.Field<string>("Value"))
                  .Where(g => g.Count() > 1)
                  .SelectMany(g => g.Where(x => !x.Field<string>("ID").StartsWith("4")));
foreach (var r in result) {
    r.Delete();
}

从以字母开头的数据表中删除重复项

3 个答案: