仅仅进行LEFT JOIN真的这么难吗?

时间:2019-01-30 23:54:20

标签: c# linq datatable

我有DataTable形式的传入数据。没有静态类可以依靠。我有2张桌子,客户和帐单。有7000个客户,1200个记帐记录。

所有客户记录都有一个“ ResponsiblePartyID”,多个客户可以具有相同的ID,该ID引用了结算表的ID。

DataTable customer= ETL.ParseTable("customer"); // 7000 records
DataTable billing= ETL.ParseTable("billing");   // 1200 records

var JoinedTables = (from c in customer.AsEnumerable()
            join p in billing.AsEnumerable() on (string) c["ResponsiblePartyID"] equals (string) p["ID"] into ps
            from p in ps.DefaultIfEmpty()
            select new {c, p}
        );

因此,即使它以错误的格式吐出结果,我也不会满意,但它只会返回2200个结果,而不是7000个。

如果它只返回1200或全部返回7000,似乎很有意义,但是2200是停止它的怪异地方。

我正在手动将二进制数据解析为数据源,我选择了DataTable作为目标,因为这似乎是正确的方法,但是在处理Linq并尝试进行联接之后,我想知道是否应该重新考虑一下。

Linq似乎不是为了查询数据表而设计的,因为我必须对所有步骤进行.AsEnumerable(),然后在完成每一步后进行.CopyToDataTable()

我没有为我的所有数据定义静态类,因为每个值的属性已经在DataTable中定义,所以采用2个DataTable进行LEFT JOIN的“正确”方法是什么(就像在SQL中一样)右边的结果没有排除左边的结果?如果我从左边有7000行的表开始,我想以7000结尾。如果没有匹配的记录,请用null填充。

我不想定义每个列,它应该返回一个展平的Array / DataTable-像这样:

var JoinedTables = (from c in customer.AsEnumerable()
            join p in billing.AsEnumerable() on (string) c["ResponsiblePartyID"] equals (string) p["ID"] into ps
            from p in ps.DefaultIfEmpty()
            select ALL_COLUMNS
        );

更新:

我使用了乔恩·斯凯特(Jon Skeet)的答案中的示例,该示例在评论中链接(Linq return all columns from all tables in the join)。他的解决方案与我的第一次尝试没有什么不同,它仍然没有解决如何将结果展平为单个数据表。这是数据和当前输出的示例:

Customers
ID  Resp_ID Name
1   1   Fatafehi
2   2   Dan
3   1   Anthony
4   1   Sekona
5   1   Osotonu
6   6   Robert
7   1   Lafo
8   1   Sarai
9   9   Esteban
10  10  Ashley
11  11  Mitch
12  64  Mark
13  11  Shawn
14  53  Kathy
15  53  Jasmine
16  16  Aubrey
17  17  Peter
18  18  Eve
19  19  Brenna
20  20  Shanna
21  21  Andrea

Billing
ID  30_Day  60_Day
2   null    null
6   null    null
9   null    null
10  null    null
11  null    null
64  null    null
53  null    null
16  null    null
17  null    null
18  null    null
19  null    null
20  -36.52  null
21  1843.30 null

Output:
2   2   Dan 2      null   null  
6   6   Robert  6      null   null  
9   9   Esteban 9      null   null  
10  10  Ashley  10     null   null  
11  11  Mitch   11     null   null  
12  64  Mark    64  -131.20   null
13  11  Shawn   11     null   null  
14  53  Kathy   53     null   null  
15  53  Jasmine 53     null   null  
16  16  Aubrey  16     null   null  
17  17  Peter   17     null   null  
18  18  Eve 18     null   null  
19  19  Brenna  19     null   null  
20  20  Shanna  20   -36.52   null
21  21  Andrea  21  1843.30   null

请注意,结果中缺少Resp_ID为1的任何人。为了显示输出,我使用以下内容,然后插入null值以进行可视化:

foreach (var row in joinedRows)
{
    Console.WriteLine(row.r1["ID"] + " " + row.r1["Resp_ID"] + " " + row.r1["Name"] + " " + row.r2["ID"] + " " + row.r2["30_Day"] + " " + row.r2["60_Day"]);
}

3 个答案:

答案 0 :(得分:1)

给出一些示例数据很好,但是如果给出可以复制/粘贴以使用的格式,那就更好了。

客户与帐单之间的关系是一对多的。其中许多可以是零,一或多个。因此,您必须使用sub而不是.GroupJoin()(这是一对一的关系):

.Join()

课程:

var customers = new[]
{
    new Customer{ Id = 1, Resp_Id = 1, Name = "Fatafehi" },
    new Customer{ Id = 2, Resp_Id = 2, Name = "Dan" },
    new Customer{ Id = 3, Resp_Id = 1, Name = "Anthony" },
    new Customer{ Id = 4, Resp_Id = 1, Name = "Sekona" },
    new Customer{ Id = 5, Resp_Id = 1, Name = "Osotonu" },
    new Customer{ Id = 6, Resp_Id = 6, Name = "Robert" },
    new Customer{ Id = 7, Resp_Id = 1, Name = "Lafo" },
    new Customer{ Id = 8, Resp_Id = 1, Name = "Sarai" },
    new Customer{ Id = 9, Resp_Id = 9, Name = "Esteban" },
    new Customer{ Id = 10, Resp_Id = 10, Name = "Ashley" },
    new Customer{ Id = 11, Resp_Id = 11, Name = "Mitch" },
    new Customer{ Id = 12, Resp_Id = 64, Name = "Mark" },
    new Customer{ Id = 13, Resp_Id = 11, Name = "Shawn" },
    new Customer{ Id = 14, Resp_Id = 53, Name = "Kathy" },
    new Customer{ Id = 15, Resp_Id = 53, Name = "Jasmine" },
    new Customer{ Id = 16, Resp_Id = 16, Name = "Aubrey" },
    new Customer{ Id = 17, Resp_Id = 17, Name = "Peter" },
    new Customer{ Id = 18, Resp_Id = 18, Name = "Eve" },
    new Customer{ Id = 19, Resp_Id = 19, Name = "Brenna" },
    new Customer{ Id = 20, Resp_Id = 20, Name = "Shanna" },
    new Customer{ Id = 21, Resp_Id = 21, Name = "Andrea" },
};

var billings = new[]
{
    new Billing{ Id = 2, Day30 = null, Day60 = null },
    new Billing{ Id = 6, Day30 = null, Day60 = null },
    new Billing{ Id = 9, Day30 = null, Day60 = null },
    new Billing{ Id = 10, Day30 = null, Day60 = null },
    new Billing{ Id = 11, Day30 = null, Day60 = null },
    new Billing{ Id = 64, Day30 = null, Day60 = null },
    new Billing{ Id = 53, Day30 = null, Day60 = null },
    new Billing{ Id = 16, Day30 = null, Day60 = null },
    new Billing{ Id = 17, Day30 = null, Day60 = null },
    new Billing{ Id = 18, Day30 = null, Day60 = null },
    new Billing{ Id = 19, Day30 = null, Day60 = null },
    new Billing{ Id = 20, Day30 = -36.52, Day60 = null },
    new Billing{ Id = 21, Day30 = 1843.30, Day60 = null },
};

var aggregate = customers.GroupJoin(
    billings, 
    customer => customer.Resp_Id, 
    billing => billing.Id, 
    (customer, AllBills) => new
    {
        customer.Id,
        customer.Resp_Id,
        customer.Name,
        AllBills
    });

foreach (var item in aggregate)
{
    Console.WriteLine($"{item.Id.ToString().PadLeft(2)}   {item.Resp_Id.ToString().PadLeft(2)}   {item.Name}");

    if(!item.AllBills.Any())
        Console.WriteLine("No bills found!");

    foreach (var bill in item.AllBills)
    {
        Console.WriteLine($"   {bill.Id.ToString().PadLeft(2)}   {bill.Day30}   {bill.Day60}");
    }

    Console.WriteLine();
}

Console.WriteLine("Finished");
Console.ReadKey();

答案 1 :(得分:1)

因此,您有CustomersBillings。每个CustomerId中都有一个主键,在Billing中具有一个RespId的外键。

几个客户可以为此外键具有相同的值。通常,这将是BillingsCustomers之间的一对多关系。但是,您的某些Customers具有不指向任何Billing的外键值。

class Customer
{
    public int Id {get; set;}            // primary key
    ... // other properties

    // every Customer has exactly one Billing, using foreign key:
    public int RespId {get; set;}        // wouldn't BillingId be a better Name?
}
class Billing
{
    public int Id {get; set;}            // primary key
    ... // other properties
}

现在让我们分离一些关注点:

我们将您的DataTablesIEnumerable<...>的转换与您的LINQ处理分开。这不仅使您的问题更容易理解,而且使其可以更好地测试,重用和维护:如果您的数据表更改为例如数据库或CSV文件,则无需更改LINQ语句。

创建DataTable的扩展方法以转换为IEnumerable并返回。参见extension methods Demystified

public static IEnumerable<Customer> ToCustomers(this DataTable table)
{
    ... // TODO: implement
}
public static IEnumerable<Billing> ToBillings(this DataTable table)
{
    ... // TODO: implement
}

public static DataTable ToDataTable(this IEnumerable<Customer> customers) {...}
public static DataTable ToDataTable(this IEnumerable<Billing> billings) {...}

您比我更了解DataTables,所以我将把代码留给您。有关更多信息:Convert DataTable to IEnumerableConvert IEnumerable to DataTable

所以现在我们有以下内容:

DataTable customersTable = ...
DataTable billingsTable = ...
IEnumerable<Customer> customers = customersTable.ToCustomers();
IEnumerable<Billing> billings = billingsTable.ToBillings();

我们准备好使用LINQ!

您的Linq查询

如果使用外键在两个序列之间存在关系,并且执行了完全内部联接,则不会获得Customers没有匹配的Billing的情况。如果确实需要它们,则需要一个左外连接:Customers不带Billing的{​​{1}}会有一些默认值,通常为空。

LINQ没有左外连接。您可以找到几个solutions on Stackoverflow on how to mimic a left-outer-join。您甚至可以为此编写扩展功能。

Billing

要使此函数更可重用,请创建没有keyComparer和defaultRight参数的重载:

public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
    this IEnumerable<TLeft> leftCollection,     // the left collection
    IEnumerable<TRight> rightCollection,        // the right collection to join
    Func<TLeft, TKey> leftKeySelector,          // the function to select left key
    Func<TRight, TKey> rightKeySelector,        // the function to select right key  
    Func<TLeft, TRight, TResult> resultSelector // the function to create the result
    TRight defaultRight,                        // the value to use if there is no right key   
    IEqualityComparer<TKey> keyComparer)        // the equality comparer to use
{
    // TODO: exceptions if null input that can't be repaired
    if (keyComparer == null) keyComparer = EqualityComparer.Default<TKey>();
    if (defaultRight == null) defaultRight = default(TRight);

    // for fast Lookup: put all right elements in a Lookup using the right key and the keyComparer:
    var rightLookup = rightCollection
        .ToLookup(right => rightKeySelector(right), keyComparer);

    foreach (TLeft leftElement in leftCollection)
    {
         // get the left key to use:
         TKey leftKey = leftKeySelector(leftElement);
         // get all rights with this same key. Might be empty, in that case use defaultRight
         var matchingRightElements = rightLookup[leftKey]
             .DefaultIfEmtpy(defaultRight);
         foreach (TRight rightElement in matchingRightElements)
         {
             TResult result = ResultSelector(leftElement, rightElement);
             yield result;
         }
    }
}

现在您已经有了这个非常可重用的功能,让我们创建一个从左到外加入客户和帐单的功能:

public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
    this IEnumerable<TLeft> leftCollection,     // the left collection
    IEnumerable<TRight> rightCollection,        // the right collection to join
    Func<TLeft, TKey> leftKeySelector,          // the function to select left key
    Func<TRight, TKey> rightKeySelector,        // the function to select right key    
    Func<TLeft, TRight, TResult> resultSelector)// the function to create the result

{    // call the other overload with null for keyComparer and defaultRight
     return LeftOuterJoin(leftCollection, rightCollection,
        leftKeySelector, rightKeySelector, restultSelector, 
        null, null);
}

您没有在结果中指定所需的内容,必须自己编写该函数:

public static IEnumerable<TResult> LeftOuterJoin<TResult>(
    this IEnumerable<Customer> customers,
    IEnumerable<Billing> billings,
    Func<Customer, Billing, TResult> resultSelector)
{
    return customers.LeftOuterJoin(billings,  // left outer join Customer and Billings
       customer => customer.RespId,           // from every Customer take the foreign key
       billing => billing.Id                  // from every Billing take the primary key
       // from every customer with matching (or default) billings
       // create one result:
       (customer, billing) => resultSelector(customer, billing));                                
}

以LINQ方式将所有内容放在一起

 public static IEnumerable<CustomerBilling> LeftOuterJoinCustomerBilling(
    this IEnumerable<Customer> customers,
    IEnumerable<Billing> billings)
 {
      // call the LeftOuterJoin with the correct function to create a CustomerBilling, something like:
      return customers.LeftOuterJoin(billings,
    (customer, billing) => new CustomerBilling()
    {    // select the columns you want to use:
         CustomerId = customer.Id,
         CustomerName = customer.Name,
         ...

         BillingId = billing.Id,
         BillingTotal = billing.Total,
         ...
    });

请注意,除最后一个函数以外的所有函数都将延迟执行:在您调用ToDataTable之前,不会枚举任何东西。

如果需要,可以将所有内容放到一个大的LINQ语句中。这不会大大加快您的处理过程,但是会降低可读性,可测试性和可维护性。

请注意,由于我们将数据的保存方式与数据的处理方式分开,因此,如果您决定将数据保存在CSV文件或数据库中,或者您想要在数据库中使用不同的值,则所做的更改将很小CustomerBilling,或者您的客户得到一些额外的字段。

答案 2 :(得分:0)

Harald和Oliver提供了很好的答案,但是我已经讨论过不具有静态类。我从一个二进制平面文件数据库开始,该数据库被逐字节地解析为byte[],并且在使用DataRows定义文件将所有二进制转换添加到JSON之后,一直到确定数据类型。结果是API可以查询任何平面文件并将其返回到DataTable,然后可以在不使用静态类的情况下对其进行查询-然后将其转换为JSON以发布到Web API。 / p>

这样,我可以快速调整查询并传播更改,而不必重新定义静态类和复杂的关系。我本来计划导出到SQLite数据库,然后在上周找到Linq之前对其进行查询。

由于我还是Linq的新手,所以我学到了很多东西,无法确定如何以.AsEnumerable()的方式询问有关数据的问题,然后了解如何修改使用静态类的答案。尽管他们的答案很有价值,并且可能提供性能优势,但是由于灵活性要求,它与我的用例不符。这是我使用的精简版本:

DataTable finalResults = ( from cus in customers.AsEnumerable()
    join bill in billing.AsEnumerable().DefaultIfEmpty() on  cus.Field<string>("Resp_ID")  equals age.Field<string>("ID")  into cs
    from c in cs.DefaultIfEmpty() 
    select new
    {
        reference_id = cus["CustomerId"],
        family_id = cus["Resp_ID"],
        last_name = cus["LastName"],
        first_name = cus["FirstName"],
        billing_31_60 = c == null ? "0" : c["billing_31_60"],
        billing_61_90 = c == null ? "0" : c["billing_61_90"],
        billing_over_90 = c == null ? "0" : c["billing_over_90"],
        billing_0_30 = c == null ? "0" : c["billing_0_30"]    
    }).CopyToDataTable();