我正在尝试基于动态密钥对两个表进行LINQ。用户可以通过组合框更改密钥。键可能是money,string,double,int等。目前我收到的数据很好,但没有过滤掉双打。我可以在VB中过滤掉double,但它是slooooow。我想在LINQ查询中做到这一点。
以下是数据:
第一张表:
-------------------------------------------------------------
| AppleIndex | AppleCost | AppleColor | AppleDescription |
------------------------------------------------------------
| 1 | 3 | Red | This is an apple |
| 2 | 5 | Green | This is an apple |
| 3 | 4 | Pink | This is an apple |
| 4 | 2 | Yellow | This is an apple |
| 5 | 2 | Orange | This is an apple |
| 1 | 3 | Red | This is a duplicate|
| 2 | 5 | Green | This is a duplicate|
| 3 | 4 | Pink | This is a duplicate|
| 4 | 2 | Yellow | This is a duplicate|
| 5 | 2 | Orange | This is a duplicate|
-------------------------------------------------------------
第二张表:
------------------------------------------------------------
| OrangeIndex | OrangeCost | OrangeColor | OrangeDescription |
------------------------------------------------------------
| 1 | 1 | Orange | This is an Orange |
| 2 | 3 | Orange | |
| 3 | 2 | Orange | This is an Orange |
| 4 | 3 | Orange | |
| 5 | 2 | Orange | This is an Orange |
------------------------------------------------------------
目前,我使用以下代码来获取太多数据:
Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows Distinct
结果:
-------------------------------------------------------------------------
| 1 | 3 | Red | This is an apple | 1 | Orange | This is an Orange |
| 1 | 3 | Red | This is an duplicate | 1 | Orange | This is an Orange |
| 2 | 5 | Green | This is an apple | 3 | Orange | |
| 2 | 5 | Green | This is an duplicate | 3 | Orange | |
| 3 | 4 | Pink | This is an apple | 2 | Orange | This is an Orange |
| 3 | 4 | Pink | This is an duplicate | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 3 | Orange | |
| 4 | 2 | Yellow | This is an duplicate | 3 | Orange | |
| 5 | 2 | Orange | This is an apple | 2 | Orange | This is an Orange |
| 5 | 2 | Orange | This is an duplicate | 2 | Orange | This is an Orange |
-------------------------------------------------------------------------
期望的结果:
------------------------------------------------------------------------
| 1 | 3 | Red | This is an apple | 1 | 1 | Orange | This is an Orange |
| 2 | 5 | Green | This is an apple | 2 | 3 | Orange | |
| 3 | 4 | Pink | This is an apple | 3 | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 4 | 3 | Orange | |
| 5 | 2 | Orange | This is an apple | 5 | 2 | Orange | This is an Orange |
------------------------------------------------------------------------
我尝试了以下内容:
'Get the original Column Names into an Array List
'MasterTableColumns = GetColumns(qMasterDS, TheMasterTable) '(external code)
'Plug the Existing DataSet into a DataView:
Dim View As DataView = New DataView(qMasterTable)
'Sort by the Primary Key:
View.Sort = ThePrimaryKey
'Build a new table listing only one column:
Dim newListTable As DataTable = _
View.ToTable("UniqueData", True, ThePrimaryKey)
这会返回唯一列表,但不会返回相关数据:
-------------
| AppleIndex |
-------------
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
-------------
所以我尝试了这个:
'Build a new table with ALL the columns:
Dim newFullTable As DataTable = _
View.ToTable("UniqueData", True, _
MasterTableColumns(0), _
MasterTableColumns(1), _
MasterTableColumns(2), _
MasterTableColumns(3))
不幸的是,它会产生以下内容...... 带有重复项:
-------------------------------------------------------------
| AppleIndex | AppleCost | AppleColor | AppleDescription |
------------------------------------------------------------
| 1 | 3 | Red | This is an apple |
| 2 | 5 | Green | This is an apple |
| 3 | 4 | Pink | This is an apple |
| 4 | 2 | Yellow | This is an apple |
| 5 | 2 | Orange | This is an apple |
| 1 | 3 | Red | This is a duplicate|
| 2 | 5 | Green | This is a duplicate|
| 3 | 4 | Pink | This is a duplicate|
| 4 | 2 | Yellow | This is a duplicate|
| 5 | 2 | Orange | This is a duplicate|
-------------------------------------------------------------
有什么想法吗?
~~~~~~~~~~~~更新:~~~~~~~~~~~~
Jeff M建议使用以下代码。 (谢谢杰夫)然而,它给了我一个错误。有谁知道在VB中使这个工作的语法?我有点唠叨,似乎无法做到正确。
Dim matches = _
From mRows In (From row In LinqMasterTable _
Group row By row(ThePrimaryKey) Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
“row(ThePrimaryKey)”第三行出错:
“范围变量名称只能从没有参数的简单或限定名称推断。”
答案 0 :(得分:1)
嗯,基本问题不是LINQ。这是你的第一个表包含“重复”的事实,它们并不是真的重复,因为在你的例子中,每一行都是独特的。
所以,我们的问题是“我们如何识别原始表格中的重复项?”。一旦得到回答,剩下的应该是微不足道的。
例如(在C#中因为我不确定VB语法)
var Matches = from mRows in LinqMasterTable
.Where(r=>r.AppleDescription=="This is an Apple")
join sRows in LinqSecondTable
on mRows(ThePrimaryKey) equals sRows(TheForignKey)
orderby mRows(ThePrimaryKey)
select new { mRows, sRows};
答案 1 :(得分:0)
修改强>
这是我如何编写C#LINQ查询。这是一个替代版本,而不是使用Distinct()
,使用嵌套查询和分组,它应具有相似的语义。它应该很容易转换为VB。
var matches = from mRows in (from row in LinqMasterTable
group row by row[ThePrimaryKey] into g
select g.First())
join sRows in LinqSecondTable
on mRows[ThePrimaryKey] Equals sRows[TheForignKey]
orderby mRows[ThePrimaryKey]
select new { mRows, sRows }
我尝试上面的VB版本:
修改强>
至于最近的错误,我确切知道如何处理它。当我使用VB LINQ时,我发现编译器不喜欢复杂的分组表达式。要解决这个问题,请将row(ThePrimaryKey)
分配给临时变量并按该变量分组。它应该工作。
Dim matches = From mRows In (From row In LinqMasterTable _
Let grouping = row(ThePrimaryKey)
Group row By grouping Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
实际上,经过第二次检查,结果是分组的内容需要一个名字。以下内容可行。
Dim matches = From mRows In (From row In LinqMasterTable _
Group row By Grouping = row(ThePrimaryKey) Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
答案 2 :(得分:0)
声明等:
Private Sub LinqTwoTableInnerJoin(ByRef qMasterDS As DataSet, _
ByRef qMasterTable As DataTable, _
ByRef qSecondDS As DataSet, _
ByRef qSecondTable As DataTable, _
ByRef qPrimaryKey As String, _
ByRef qForignKey As String, _
ByVal qResultsName As String)
Dim TheMasterTable As String = qMasterTable.TableName
Dim TheSecondTable As String = qSecondTable.TableName
Dim ThePrimaryKey As String = qPrimaryKey
Dim TheForignKey As String = qForignKey
Dim TheNewForignKey As String = ""
MasterTableColumns = GetColumns(qMasterDS, TheMasterTable)
SecondTableColumns = GetColumns(qSecondDS, TheSecondTable)
Dim mColumnCount As Integer = MasterTableColumns.Count
Dim sColumnCount As Integer = SecondTableColumns.Count
Dim ColumnCount As Integer = mColumnCount + sColumnCount
Dim LinqMasterTable = qMasterDS.Tables(TheMasterTable).AsEnumerable
Dim LinqSecondTable = qSecondDS.Tables(TheSecondTable).AsEnumerable
获取数据并按选定键进行排序:
Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
将结果放入数据集表:
' Make sure the dataset is available and/or cleared:
If dsResults.Tables(qResultsName) Is Nothing Then dsResults.Tables.Add(qResultsName)
dsResults.Tables(qResultsName).Clear() : dsResults.Tables(qResultsName).Columns.Clear()
'Adds Master Table Column Names
For x = 0 To MasterTableColumns.Count - 1
dsResults.Tables(qResultsName).Columns.Add(MasterTableColumns(x))
Next
'Rename Second Table Names if Needed:
For x = 0 To SecondTableColumns.Count - 1
With dsResults.Tables(qResultsName)
For y = 0 To .Columns.Count - 1
If SecondTableColumns(x) = .Columns(y).ColumnName Then
SecondTableColumns(x) = SecondTableColumns(x) & "_2"
End If
Next
End With
Next
'Make sure that the Forign Key is a Unique Value
If ForignKey1 = PrimaryKey Then
TheNewForignKey = ForignKey1 & "_2"
Else
TheNewForignKey = ForignKey1
End If
'Adds Second Table Column Names
For x = 0 To SecondTableColumns.Count - 1
dsResults.Tables(qResultsName).Columns.Add(SecondTableColumns(x))
Next
'Copy Results into the Dataset:
For Each Match In Matches
'Build an array for each row:
Dim NewRow(ColumnCount - 1) As Object
'Add the mRow Items:
For x = 0 To MasterTableColumns.Count - 1
NewRow(x) = Match.mRows.Item(x)
Next
'Add the srow Items:
For x = 0 To SecondTableColumns.Count - 1
Dim y As Integer = x + (MasterTableColumns.Count)
NewRow(y) = Match.sRows.Item(x)
Next
'Add the array to dsResults as a Row:
dsResults.Tables(qResultsName).Rows.Add(NewRow)
Next
为用户提供清除双打的选项:
If chkUnique.Checked = True Then
ReMoveDuplicates(dsResults.Tables(qResultsName), ThePrimaryKey)
End If
如果他们愿意,请删除重复项:
Private Sub ReMoveDuplicates(ByRef SkipTable As DataTable, _
ByRef TableKey As String)
'Make sure that there's data to work with:
If SkipTable Is Nothing Then Exit Sub
If TableKey Is Nothing Then Exit Sub
'Create an ArrayList of rows to delete:
Dim DeleteRows As New ArrayList()
'Fill the Array with Row Number of the items equal
'to the item above them:
For x = 1 To SkipTable.Rows.Count - 1
Dim RowOne As DataRow = SkipTable.Rows(x - 1)
Dim RowTwo As DataRow = SkipTable.Rows(x)
If RowTwo.Item(TableKey) = RowOne.Item(TableKey) Then
DeleteRows.Add(x)
End If
Next
'If there are no hits, exit this sub:
If DeleteRows.Count < 1 Or DeleteRows Is Nothing Then
Exit Sub
End If
'Otherwise, remove the rows based on the row count value:
For x = 0 To DeleteRows.Count - 1
'Start at the END and count backwards so the duplicate
'item's row count value doesn't change with each deleted row
Dim KillRow As Integer = DeleteRows((DeleteRows.Count - 1) - x)
'Delete the row:
SkipTable.Rows(KillRow).Delete()
Next
End Sub
然后清理剩余物:
If Not chkRetainKeys.Checked = True Then 'Removes Forign Key
dsResults.Tables(qResultsName).Columns.Remove(TheNewForignKey)
End If
'Clear Arrays
MasterTableColumns.Clear()
SecondTableColumns.Clear()
最终分析: 将这个包含4个列,65,535行和一些双打的2个文件。处理时间,大约1秒。实际上,将字段加载到内存中需要的时间比解析数据所花费的时间要长。