我有两个结构不同的数据表,一个表的列名为“ Campaign ID”不是唯一的,我想将它加入到B表的唯一的“ Campaign ID”中。
因此等同于sql中的内容
select * from
A left join B on
A.[Campagin ID] = B.[Cmpaign ID]
我尝试过datatable.merge不起作用,因为它只能基于唯一的列字段进行合并。
我已经尝试过Linq和Lamda。
var resultDt = from c in dt.AsEnumerable()
join lookup in lookupDt.AsEnumerable() on c["Campaign ID"].ToString() equals lookup["EventID"]
.ToString() into results
from r in results.DefaultIfEmpty()
select new { a=c, b =lookup };
它返回两组数据行,而不是一组数据行。
我也尝试过字典,但是它运行起来太昂贵了。
预期结果 如果我选择r,它将仅返回表B的值
I expected the output would be like
select * from
A left join B on
A.[Campagin ID] = B.[Cmpaign ID]
在SQL中
如果表A像
Campaign ID Description Number
eda1e64c-0002-4000-8000-000000000198
eda1e64c-0002-4000-8000-000000000198
eda1e64c-0002-4000-8000-000000000198
eda1e64c-0002-4000-8000-000000000198
eda1e64c-0002-4000-8000-000000000000 Testing 123 1111
Description 2 3333
表B就像
Campaign ID Name
eda1e64c-0002-4000-8000-000000000198 Test Name1
eda1e64c-0002-4000-8000-000000000000 Test Name2
预期结果
Campaign ID Description Number Name
eda1e64c-0002-4000-8000-000000000198 Test Name1
eda1e64c-0002-4000-8000-000000000198 Test Name1
eda1e64c-0002-4000-8000-000000000198 Test Name1
eda1e64c-0002-4000-8000-000000000198 Test Name1
eda1e64c-0002-4000-8000-000000000000 Testing 123 1111 Test Name2
我可以使用任何默认的c#方法还是有效的方法吗? 非常感谢您提供的所有帮助。
答案 0 :(得分:1)
我想您快要准备好了,只需将LINQ查询输出转换为对象数组,然后将其作为单独的数据放入新的数据表中即可;请记住,LINQ主要用于查询和返回结果集合,而不是修改现有的东西:
使用LINQ的左联接,手动输出列表,手动消耗到数据表中
var query =
from ce in c.AsEnumerable()
join le in lookup.AsEnumerable() on c.Field<Guid>("Campaign ID") equals le.Field<Guid>("Campaign ID") into cele
from lenull in cele.DefaultIfEmpty()
select new object[]
{
ce.Field<Guid>("Campaign ID"),
ce.Field<string>("Description"),
ce.Field<int>("Number"), //don't know how your table has null here, maybe <int?>
lenull?.Field<string>("Name")
};
DataTable c = new DataTable(); //to hold results
c.Columns.Add("Campaign ID", typeof(Guid));
c.Columns.Add("Description");
c.Columns.Add("Number", typeof(int));
c.Columns.Add("Name");
foreach (var at in query)
c.Rows.Add(at);
因为lenull可能为null,所以我使用null传播子来避免尝试获取null行字段的null引用异常。我们也可以动态地执行此操作,而无需进行反思,但是速度要慢得多。对于以下示例,我使用了自己的简单数据表对,设置如下:
//setup part
DataTable a = new DataTable();
a.Columns.Add("ID", typeof(int));
a.Columns.Add("Name", typeof(string));
a.Columns.Add("Age", typeof(int));
DataTable b = new DataTable();
var pk = b.Columns.Add("ID", typeof(int));
b.Columns.Add("Address", typeof(string));
b.Columns.Add("YearsAt", typeof(int));
b.PrimaryKey = new[] { pk };
a.Rows.Add(1, "John", 22);
a.Rows.Add(2, "Mary", 33);
a.Rows.Add(3, "Bill", 44);
b.Rows.Add(1, "JohnAddr", 3);
b.Rows.Add(2, "MaryAddr", 4);
通过LINQ左加入,手动输出列表,动态消耗
var query =
from ae in a.AsEnumerable()
join be in b.AsEnumerable() on ae.Field<int>("ID") equals be.Field<int>("ID_") into aebe
from be2 in aebe.DefaultIfEmpty()
select new Dictionary<string, object>
{
{"ID", ae.Field<int>("ID")},
{"Name", ae.Field<string>("Name") },
{"Age", ae.Field<int>("Age") },
{"Address", be2?.Field<string>("Address") },
{"YearsAt", be2?.Field<int>("YearsAt") }
};
//setup datatable
DataTable c = new DataTable();
int keyCount = query.First().Keys.Count; //track columns needed to be added
foreach (var dict in query)
{
var ro = c.NewRow();
foreach (string key in dict.Keys)
{
if (keyCount > 0 && dict[key] != null && !c.Columns.Contains(key))
{ //if the column is not in the table, and the value isnt null (so we can deduce the type)
c.Columns.Add(key, dict[key].GetType());
keyCount--; //mark it as added. Eventually this will hit 0 and we won't evaluate the other two clauses
}
if (dict[key] != null) //don't store nulls
ro[key] = dict[key];
}
c.Rows.Add(ro);
}
当然,您可能会抱怨您仍然必须在LINQ查询选择中指定要删除的所有列。我们也可以使它动态:
通过LINQ左联接,动态输出列表,动态消耗
var query =
from ae in a.AsEnumerable()
join be in b.AsEnumerable() on ae.Field<int>("ID") equals be.Field<int>("ID_") into aebe
from be2 in aebe.DefaultIfEmpty()
select MapToDict(ae, be2);
//setup datatable
DataTable c = new DataTable();
int keyCount = query.First().Keys.Count;
foreach (var dict in query)
{
//have we got all our columns addded yet?
var ro = c.NewRow();
foreach (string key in dict.Keys)
{
if (keyCount > 0 && dict[key] != null && !c.Columns.Contains(key))
{ //if the column is not in the table, and the value isnt null (so we can deduce the type)
c.Columns.Add(key, dict[key].GetType());
keyCount--; //mark it as added. Eventually this will hit 0 and we won't evaluate the other two clauses
}
if (dict[key] != null) //don't store nulls
ro[key] = dict[key];
}
c.Rows.Add(ro);
}
我从不喜欢LINQ中的DataTables上的联接,我一直喜欢:
以下是执行上述操作的代码:
使用循环左联接
//ensure unique named columns in b, and grow a's columns
foreach (DataColumn bcol in b.Columns) {
while (a.Columns.Contains(bcol.ColumnName))
bcol.ColumnName += "_";
a.Columns.Add(bcol.ColumnName, bcol.DataType);
}
//perform left join
foreach (DataRow aro in a.Rows) {
var f = b.Rows.Find(aro["ID"]);
if (f != null)
foreach (DataColumn bcol in b.Columns)
aro[bcol.ColumnName] = f[bcol];
}
将其转换为扩展方法可能很琐碎,这样任何表都可以像a.LeftJoin(b,aID:“ ID”,bID:“ ID”)那样将另一个表连接到该表上。想要一个比简单的等式更为复杂的逻辑,那么就需要对代码进行一些更改。
出于好奇,我连续尝试了所有4种方法,并对它们进行了计时。在我的上下文中,使用固定结构和硬编码列名的循环比LINQ快2.5倍,比使用字典使事物动态化的循环快4倍:
for (int lc = 0; lc < 10; lc++) {
//setup 100K rows
DataTable a = new DataTable();
a.Columns.Add("ID", typeof(int));
a.Columns.Add("Name", typeof(string));
a.Columns.Add("Age", typeof(int));
DataTable b = new DataTable();
var pk = b.Columns.Add("ID", typeof(int));
b.Columns.Add("Address", typeof(string));
b.Columns.Add("YearsAt", typeof(int));
b.PrimaryKey = new[] { pk };
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
a.Rows.Add(i, Guid.NewGuid().ToString(), r.Next(20, 99));
if (r.Next(0, 9) < 1)
b.Rows.Add(i, Guid.NewGuid().ToString(), r.Next(1, 10));
}
Stopwatch sw = Stopwatch.StartNew();
### INSERT CHOSEN METHOD HERE ###
sw.Stop();
Console.WriteLine($"Time: {sw.ElapsedMilliseconds}ms");
}
对于处理10万行的循环,结果通常为80ms;对于LINQ硬代码(手动选择,手动表),结果通常为200ms;对于LINQ字典(动态内容)方法,结果为400ms。
答案 1 :(得分:0)
怎么样?
TableAlist.Select(A => A.CampaignId, A.Description, A.Number,
Name = TableBlist.FirstOrDefault(B => B.CampaignId == A.CampaignId)?.Name ?? "").ToList()