如何将多行与部分重复数据合并为一行但保留非重复数据?

时间:2017-12-13 09:36:24

标签: linq datatable merge

我有一个大的数据表(超过300K行,其中有40列)和数据,像这样的片段(所有值都是字符串):

  

colA colB colC ColD ColdE ColF ColG ColH
  -------------------------------------------------- ------
  A01 B01 C01 DA1 EA1 FA1 GA1 HA1
  A01 B01 C01 DA2 EA2 FA2 GA2 HA2
  A02 B02 C02 DA3 EA3 FA3 GA3 HA3
  A02 B02 C02 DA4 EA4 FA4 GA4 HA4
  A03 B03 C03 DA5 EA5 FA5 GA5 HA5
  A04 B04 C04 DA6 EA6 FA6 GA6 HA6

有部分数据重复,我希望使用colA + ColB + ColC键合并重复数据并保留ColD ColE ColF,将第一行用于其他列。预期的结果如下:

  

colA colB colC ColD1 colE1 colF1 colG1 ColD2 colE2 colF2 colG2 ColH
  ------------------------------------------------ <无线电通信/>   A01 B01 C01 DA1 EA1 FA1 GA1 DA2 EA2 FA2 GA2 HA1
  A02 B02 C02 DA3 EA3 FA3 GA3 DA4 EA4 FA4 GA4 HA3
  A03 B03 C03 DA5 EA5 FA5 GA5 null null null null HA5
  A04 B04 C04 DA6 EA6 FA6 GA6 null null null null HA6

这就像枢轴但有一些区别,我尝试使用T#SQL或LINQ与C#,但不知道这样做,请有人帮忙,非常感谢。

2 个答案:

答案 0 :(得分:1)

听起来像是ExpandoObject

的工作

依赖您提供的输入记录

var input = new DataTable();
input.Columns.Add("ColA");
input.Columns.Add("ColB");
input.Columns.Add("ColC");
input.Columns.Add("ColD");
input.Rows.Add("A01", "B01", "CA1", "DA1");
input.Rows.Add("A01", "B01", "CA2", "DA2");
input.Rows.Add("A02", "B02", "CA3", "DA3");
input.Rows.Add("A02", "B02", "CA4", "DA4");
input.Rows.Add("A03", "B03", "CA5", "DA5");
input.Rows.Add("A04", "B04", "CA6", "DA6");

您可以将记录转换为动态可扩展对象

public IDictionary<string, Object> Map(DataRow row)
{
    var columns = row.Table.Columns;
    var result = new ExpandoObject() as IDictionary<string, Object>;
    for (var index = 0; index < row.ItemArray.Count(); index++)
    {
        result.Add($"{columns[index]}", row[index]);
    }
    return result;
}

然后有一些逻辑意味着按标记元素对输入进行分组并在需要的地方展开

var seed = new List<IDictionary<string, Object>>();
var output = input
    .AsEnumerable()
    .Select(Map)
    .Aggregate(seed, (results, current)=>
    {
        // Check if the current values match any of the first element in the results
        var query = from result in results
                    let marker = result
                        .Select(p => p.Value)
                        .FirstOrDefault()
                    where current.Values.Contains(marker)
                    select result;

        var found = query.SingleOrDefault();
        if (found == null)
        {
            // None were found then simply append the current values
            results.Add(current);
        }
        else
        {
            // Some were found then isolate the new ones
            var others = from value in current.Values
                         where !found.Values.Contains(value)
                         select value;

            // Append the new ones to the found result
            foreach (var value in others)
            {
                var index = found.Values.Count;
                found.Add($"Col{index}".ToString(), value);
            }
        }

        return results;
    });

,最终结果将如下所示

enter image description here

检查gist了解整个情况

答案 1 :(得分:0)

请注意,这是通用解决方案,但在给定示例中可以使用。

List<string[]> input = new List<string[]>()
{
    new string[] {"A01","B01","CA1","DA1"},
    new string[] {"A01","B01","CA2","DA2"},
    new string[] {"A02","B02","CA3","DA3"},
    new string[] {"A02","B02","CA4","DA4"},
    new string[] {"A03","B03","CA5","DA5"},
    new string[] {"A04","B04","CA6","DA6"},
};

var grouped = input.GroupBy(x => new { key1 = x[0], key2 = x[1] }, (keys, group) => new
{
    Key1 = keys.key1,
    Key2 = keys.key2,
    // skip(2) to prevent the keys to be added in the list
    Result = group.SelectMany(x => x.Skip(2)).ToList()
});

输出:

  

{Key1 =“A01”,Key2 =“B01”,结果= [“CA1”,“DA1,”CA2“,”DA2“]}

     

{Key1 =&gt; “A02”,Key2 =“B02”,结果= [“CA3”,“DA3”,“CA4”,“DA4”]}

     

{Key1 =“A03”,Key2 =“B03”,结果= [“CA5”,“DA5”]}

     

{Key1 =“A04”,Key2 =“B04”,结果= [“CA6”,“DA6”]}