我有DataTable形式的传入数据。没有静态类可以依靠。我有2张桌子,客户和帐单。有7000个客户,1200个记帐记录。
所有客户记录都有一个“ ResponsiblePartyID”,多个客户可以具有相同的ID,该ID引用了结算表的ID。
DataTable customer= ETL.ParseTable("customer"); // 7000 records
DataTable billing= ETL.ParseTable("billing"); // 1200 records
var JoinedTables = (from c in customer.AsEnumerable()
join p in billing.AsEnumerable() on (string) c["ResponsiblePartyID"] equals (string) p["ID"] into ps
from p in ps.DefaultIfEmpty()
select new {c, p}
);
因此,即使它以错误的格式吐出结果,我也不会满意,但它只会返回2200个结果,而不是7000个。
如果它只返回1200或全部返回7000,似乎很有意义,但是2200是停止它的怪异地方。
我正在手动将二进制数据解析为数据源,我选择了DataTable作为目标,因为这似乎是正确的方法,但是在处理Linq并尝试进行联接之后,我想知道是否应该重新考虑一下。
Linq似乎不是为了查询数据表而设计的,因为我必须对所有步骤进行.AsEnumerable()
,然后在完成每一步后进行.CopyToDataTable()
。
我没有为我的所有数据定义静态类,因为每个值的属性已经在DataTable中定义,所以采用2个DataTable进行LEFT JOIN的“正确”方法是什么(就像在SQL中一样)右边的结果没有排除左边的结果?如果我从左边有7000行的表开始,我想以7000结尾。如果没有匹配的记录,请用null填充。
我不想定义每个列,它应该返回一个展平的Array / DataTable-像这样:
var JoinedTables = (from c in customer.AsEnumerable()
join p in billing.AsEnumerable() on (string) c["ResponsiblePartyID"] equals (string) p["ID"] into ps
from p in ps.DefaultIfEmpty()
select ALL_COLUMNS
);
更新:
我使用了乔恩·斯凯特(Jon Skeet)的答案中的示例,该示例在评论中链接(Linq return all columns from all tables in the join)。他的解决方案与我的第一次尝试没有什么不同,它仍然没有解决如何将结果展平为单个数据表。这是数据和当前输出的示例:
Customers
ID Resp_ID Name
1 1 Fatafehi
2 2 Dan
3 1 Anthony
4 1 Sekona
5 1 Osotonu
6 6 Robert
7 1 Lafo
8 1 Sarai
9 9 Esteban
10 10 Ashley
11 11 Mitch
12 64 Mark
13 11 Shawn
14 53 Kathy
15 53 Jasmine
16 16 Aubrey
17 17 Peter
18 18 Eve
19 19 Brenna
20 20 Shanna
21 21 Andrea
Billing
ID 30_Day 60_Day
2 null null
6 null null
9 null null
10 null null
11 null null
64 null null
53 null null
16 null null
17 null null
18 null null
19 null null
20 -36.52 null
21 1843.30 null
Output:
2 2 Dan 2 null null
6 6 Robert 6 null null
9 9 Esteban 9 null null
10 10 Ashley 10 null null
11 11 Mitch 11 null null
12 64 Mark 64 -131.20 null
13 11 Shawn 11 null null
14 53 Kathy 53 null null
15 53 Jasmine 53 null null
16 16 Aubrey 16 null null
17 17 Peter 17 null null
18 18 Eve 18 null null
19 19 Brenna 19 null null
20 20 Shanna 20 -36.52 null
21 21 Andrea 21 1843.30 null
请注意,结果中缺少Resp_ID为1的任何人。为了显示输出,我使用以下内容,然后插入null
值以进行可视化:
foreach (var row in joinedRows)
{
Console.WriteLine(row.r1["ID"] + " " + row.r1["Resp_ID"] + " " + row.r1["Name"] + " " + row.r2["ID"] + " " + row.r2["30_Day"] + " " + row.r2["60_Day"]);
}
答案 0 :(得分:1)
给出一些示例数据很好,但是如果给出可以复制/粘贴以使用的格式,那就更好了。
客户与帐单之间的关系是一对多的。其中许多可以是零,一或多个。因此,您必须使用sub
而不是.GroupJoin()
(这是一对一的关系):
.Join()
课程:
var customers = new[]
{
new Customer{ Id = 1, Resp_Id = 1, Name = "Fatafehi" },
new Customer{ Id = 2, Resp_Id = 2, Name = "Dan" },
new Customer{ Id = 3, Resp_Id = 1, Name = "Anthony" },
new Customer{ Id = 4, Resp_Id = 1, Name = "Sekona" },
new Customer{ Id = 5, Resp_Id = 1, Name = "Osotonu" },
new Customer{ Id = 6, Resp_Id = 6, Name = "Robert" },
new Customer{ Id = 7, Resp_Id = 1, Name = "Lafo" },
new Customer{ Id = 8, Resp_Id = 1, Name = "Sarai" },
new Customer{ Id = 9, Resp_Id = 9, Name = "Esteban" },
new Customer{ Id = 10, Resp_Id = 10, Name = "Ashley" },
new Customer{ Id = 11, Resp_Id = 11, Name = "Mitch" },
new Customer{ Id = 12, Resp_Id = 64, Name = "Mark" },
new Customer{ Id = 13, Resp_Id = 11, Name = "Shawn" },
new Customer{ Id = 14, Resp_Id = 53, Name = "Kathy" },
new Customer{ Id = 15, Resp_Id = 53, Name = "Jasmine" },
new Customer{ Id = 16, Resp_Id = 16, Name = "Aubrey" },
new Customer{ Id = 17, Resp_Id = 17, Name = "Peter" },
new Customer{ Id = 18, Resp_Id = 18, Name = "Eve" },
new Customer{ Id = 19, Resp_Id = 19, Name = "Brenna" },
new Customer{ Id = 20, Resp_Id = 20, Name = "Shanna" },
new Customer{ Id = 21, Resp_Id = 21, Name = "Andrea" },
};
var billings = new[]
{
new Billing{ Id = 2, Day30 = null, Day60 = null },
new Billing{ Id = 6, Day30 = null, Day60 = null },
new Billing{ Id = 9, Day30 = null, Day60 = null },
new Billing{ Id = 10, Day30 = null, Day60 = null },
new Billing{ Id = 11, Day30 = null, Day60 = null },
new Billing{ Id = 64, Day30 = null, Day60 = null },
new Billing{ Id = 53, Day30 = null, Day60 = null },
new Billing{ Id = 16, Day30 = null, Day60 = null },
new Billing{ Id = 17, Day30 = null, Day60 = null },
new Billing{ Id = 18, Day30 = null, Day60 = null },
new Billing{ Id = 19, Day30 = null, Day60 = null },
new Billing{ Id = 20, Day30 = -36.52, Day60 = null },
new Billing{ Id = 21, Day30 = 1843.30, Day60 = null },
};
var aggregate = customers.GroupJoin(
billings,
customer => customer.Resp_Id,
billing => billing.Id,
(customer, AllBills) => new
{
customer.Id,
customer.Resp_Id,
customer.Name,
AllBills
});
foreach (var item in aggregate)
{
Console.WriteLine($"{item.Id.ToString().PadLeft(2)} {item.Resp_Id.ToString().PadLeft(2)} {item.Name}");
if(!item.AllBills.Any())
Console.WriteLine("No bills found!");
foreach (var bill in item.AllBills)
{
Console.WriteLine($" {bill.Id.ToString().PadLeft(2)} {bill.Day30} {bill.Day60}");
}
Console.WriteLine();
}
Console.WriteLine("Finished");
Console.ReadKey();
答案 1 :(得分:1)
因此,您有Customers
和Billings
。每个Customer
在Id
中都有一个主键,在Billing
中具有一个RespId
的外键。
几个客户可以为此外键具有相同的值。通常,这将是Billings
和Customers
之间的一对多关系。但是,您的某些Customers
具有不指向任何Billing
的外键值。
class Customer
{
public int Id {get; set;} // primary key
... // other properties
// every Customer has exactly one Billing, using foreign key:
public int RespId {get; set;} // wouldn't BillingId be a better Name?
}
class Billing
{
public int Id {get; set;} // primary key
... // other properties
}
现在让我们分离一些关注点:
我们将您的DataTables
到IEnumerable<...>
的转换与您的LINQ处理分开。这不仅使您的问题更容易理解,而且使其可以更好地测试,重用和维护:如果您的数据表更改为例如数据库或CSV文件,则无需更改LINQ语句。
创建DataTable的扩展方法以转换为IEnumerable并返回。参见extension methods Demystified
public static IEnumerable<Customer> ToCustomers(this DataTable table)
{
... // TODO: implement
}
public static IEnumerable<Billing> ToBillings(this DataTable table)
{
... // TODO: implement
}
public static DataTable ToDataTable(this IEnumerable<Customer> customers) {...}
public static DataTable ToDataTable(this IEnumerable<Billing> billings) {...}
您比我更了解DataTables,所以我将把代码留给您。有关更多信息:Convert DataTable to IEnumerable和Convert IEnumerable to DataTable
所以现在我们有以下内容:
DataTable customersTable = ...
DataTable billingsTable = ...
IEnumerable<Customer> customers = customersTable.ToCustomers();
IEnumerable<Billing> billings = billingsTable.ToBillings();
我们准备好使用LINQ!
您的Linq查询
如果使用外键在两个序列之间存在关系,并且执行了完全内部联接,则不会获得Customers
没有匹配的Billing
的情况。如果确实需要它们,则需要一个左外连接:Customers
不带Billing
的{{1}}会有一些默认值,通常为空。
LINQ没有左外连接。您可以找到几个solutions on Stackoverflow on how to mimic a left-outer-join。您甚至可以为此编写扩展功能。
Billing
要使此函数更可重用,请创建没有keyComparer和defaultRight参数的重载:
public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftCollection, // the left collection
IEnumerable<TRight> rightCollection, // the right collection to join
Func<TLeft, TKey> leftKeySelector, // the function to select left key
Func<TRight, TKey> rightKeySelector, // the function to select right key
Func<TLeft, TRight, TResult> resultSelector // the function to create the result
TRight defaultRight, // the value to use if there is no right key
IEqualityComparer<TKey> keyComparer) // the equality comparer to use
{
// TODO: exceptions if null input that can't be repaired
if (keyComparer == null) keyComparer = EqualityComparer.Default<TKey>();
if (defaultRight == null) defaultRight = default(TRight);
// for fast Lookup: put all right elements in a Lookup using the right key and the keyComparer:
var rightLookup = rightCollection
.ToLookup(right => rightKeySelector(right), keyComparer);
foreach (TLeft leftElement in leftCollection)
{
// get the left key to use:
TKey leftKey = leftKeySelector(leftElement);
// get all rights with this same key. Might be empty, in that case use defaultRight
var matchingRightElements = rightLookup[leftKey]
.DefaultIfEmtpy(defaultRight);
foreach (TRight rightElement in matchingRightElements)
{
TResult result = ResultSelector(leftElement, rightElement);
yield result;
}
}
}
现在您已经有了这个非常可重用的功能,让我们创建一个从左到外加入客户和帐单的功能:
public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftCollection, // the left collection
IEnumerable<TRight> rightCollection, // the right collection to join
Func<TLeft, TKey> leftKeySelector, // the function to select left key
Func<TRight, TKey> rightKeySelector, // the function to select right key
Func<TLeft, TRight, TResult> resultSelector)// the function to create the result
{ // call the other overload with null for keyComparer and defaultRight
return LeftOuterJoin(leftCollection, rightCollection,
leftKeySelector, rightKeySelector, restultSelector,
null, null);
}
您没有在结果中指定所需的内容,必须自己编写该函数:
public static IEnumerable<TResult> LeftOuterJoin<TResult>(
this IEnumerable<Customer> customers,
IEnumerable<Billing> billings,
Func<Customer, Billing, TResult> resultSelector)
{
return customers.LeftOuterJoin(billings, // left outer join Customer and Billings
customer => customer.RespId, // from every Customer take the foreign key
billing => billing.Id // from every Billing take the primary key
// from every customer with matching (or default) billings
// create one result:
(customer, billing) => resultSelector(customer, billing));
}
以LINQ方式将所有内容放在一起
public static IEnumerable<CustomerBilling> LeftOuterJoinCustomerBilling(
this IEnumerable<Customer> customers,
IEnumerable<Billing> billings)
{
// call the LeftOuterJoin with the correct function to create a CustomerBilling, something like:
return customers.LeftOuterJoin(billings,
(customer, billing) => new CustomerBilling()
{ // select the columns you want to use:
CustomerId = customer.Id,
CustomerName = customer.Name,
...
BillingId = billing.Id,
BillingTotal = billing.Total,
...
});
请注意,除最后一个函数以外的所有函数都将延迟执行:在您调用ToDataTable之前,不会枚举任何东西。
如果需要,可以将所有内容放到一个大的LINQ语句中。这不会大大加快您的处理过程,但是会降低可读性,可测试性和可维护性。
请注意,由于我们将数据的保存方式与数据的处理方式分开,因此,如果您决定将数据保存在CSV文件或数据库中,或者您想要在数据库中使用不同的值,则所做的更改将很小CustomerBilling,或者您的客户得到一些额外的字段。
答案 2 :(得分:0)
Harald和Oliver提供了很好的答案,但是我已经讨论过不具有静态类。我从一个二进制平面文件数据库开始,该数据库被逐字节地解析为byte[]
,并且在使用DataRows
定义文件将所有二进制转换添加到JSON
之后,一直到确定数据类型。结果是API可以查询任何平面文件并将其返回到DataTable
,然后可以在不使用静态类的情况下对其进行查询-然后将其转换为JSON
以发布到Web API。 / p>
这样,我可以快速调整查询并传播更改,而不必重新定义静态类和复杂的关系。我本来计划导出到SQLite数据库,然后在上周找到Linq之前对其进行查询。
由于我还是Linq的新手,所以我学到了很多东西,无法确定如何以.AsEnumerable()
的方式询问有关数据的问题,然后了解如何修改使用静态类的答案。尽管他们的答案很有价值,并且可能提供性能优势,但是由于灵活性要求,它与我的用例不符。这是我使用的精简版本:
DataTable finalResults = ( from cus in customers.AsEnumerable()
join bill in billing.AsEnumerable().DefaultIfEmpty() on cus.Field<string>("Resp_ID") equals age.Field<string>("ID") into cs
from c in cs.DefaultIfEmpty()
select new
{
reference_id = cus["CustomerId"],
family_id = cus["Resp_ID"],
last_name = cus["LastName"],
first_name = cus["FirstName"],
billing_31_60 = c == null ? "0" : c["billing_31_60"],
billing_61_90 = c == null ? "0" : c["billing_61_90"],
billing_over_90 = c == null ? "0" : c["billing_over_90"],
billing_0_30 = c == null ? "0" : c["billing_0_30"]
}).CopyToDataTable();