Linq加入参数化的不同键

时间:2010-09-09 17:10:14

标签: vb.net linq sorting merge distinct-values

我正在尝试基于动态密钥对两个表进行LINQ。用户可以通过组合框更改密钥。键可能是money,string,double,int等。目前我收到的数据很好,但没有过滤掉双打。我可以在VB中过滤掉double,但它是slooooow。我想在LINQ查询中做到这一点。

以下是数据:

第一张表:

 -------------------------------------------------------------
| AppleIndex  | AppleCost  | AppleColor  | AppleDescription   |
 ------------------------------------------------------------
|     1       |     3      | Red         | This is an apple   |
|     2       |     5      | Green       | This is an apple   |
|     3       |     4      | Pink        | This is an apple   |
|     4       |     2      | Yellow      | This is an apple   |
|     5       |     2      | Orange      | This is an apple   |
|     1       |     3      | Red         | This is a duplicate|
|     2       |     5      | Green       | This is a duplicate|
|     3       |     4      | Pink        | This is a duplicate|
|     4       |     2      | Yellow      | This is a duplicate|
|     5       |     2      | Orange      | This is a duplicate|
 -------------------------------------------------------------

第二张表:

 ------------------------------------------------------------
| OrangeIndex | OrangeCost | OrangeColor | OrangeDescription |
 ------------------------------------------------------------
|     1       |     1      | Orange      | This is an Orange |
|     2       |     3      | Orange      |                   |
|     3       |     2      | Orange      | This is an Orange |
|     4       |     3      | Orange      |                   |
|     5       |     2      | Orange      | This is an Orange |
 ------------------------------------------------------------

目前,我使用以下代码来获取太多数据:

Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
              On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
              Order By mRows(ThePrimaryKey) _
              Select mRows, sRows Distinct

结果:

 -------------------------------------------------------------------------
| 1  | 3 | Red    | This is an apple     | 1 | Orange | This is an Orange |
| 1  | 3 | Red    | This is an duplicate | 1 | Orange | This is an Orange |
| 2  | 5 | Green  | This is an apple     | 3 | Orange |                   |
| 2  | 5 | Green  | This is an duplicate | 3 | Orange |                   |
| 3  | 4 | Pink   | This is an apple     | 2 | Orange | This is an Orange |
| 3  | 4 | Pink   | This is an duplicate | 2 | Orange | This is an Orange |
| 4  | 2 | Yellow | This is an apple     | 3 | Orange |                   |
| 4  | 2 | Yellow | This is an duplicate | 3 | Orange |                   |
| 5  | 2 | Orange | This is an apple     | 2 | Orange | This is an Orange |
| 5  | 2 | Orange | This is an duplicate | 2 | Orange | This is an Orange |
 -------------------------------------------------------------------------

期望的结果:

 ------------------------------------------------------------------------
| 1 | 3 | Red    | This is an apple | 1 | 1 | Orange | This is an Orange |
| 2 | 5 | Green  | This is an apple | 2 | 3 | Orange |                   |
| 3 | 4 | Pink   | This is an apple | 3 | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 4 | 3 | Orange |                   |
| 5 | 2 | Orange | This is an apple | 5 | 2 | Orange | This is an Orange |
 ------------------------------------------------------------------------

我尝试了以下内容:

'Get the original Column Names into an Array List
'MasterTableColumns = GetColumns(qMasterDS, TheMasterTable) '(external code)

'Plug the Existing DataSet into a DataView:
Dim View As DataView = New DataView(qMasterTable)

'Sort by the Primary Key:
View.Sort = ThePrimaryKey

'Build a new table listing only one column:
Dim newListTable As DataTable = _
View.ToTable("UniqueData", True, ThePrimaryKey)

这会返回唯一列表,但不会返回相关数据:

 -------------
| AppleIndex  |
 -------------
|     1       | 
|     2       | 
|     3       |
|     4       |
|     5       |
 -------------

所以我尝试了这个:

'Build a new table with ALL the columns:
Dim newFullTable As DataTable = _
View.ToTable("UniqueData", True, _
     MasterTableColumns(0), _
     MasterTableColumns(1), _
     MasterTableColumns(2), _
     MasterTableColumns(3))

不幸的是,它会产生以下内容...... 带有重复项

 -------------------------------------------------------------
| AppleIndex  | AppleCost  | AppleColor  | AppleDescription   |
 ------------------------------------------------------------
|     1       |     3      | Red         | This is an apple   |
|     2       |     5      | Green       | This is an apple   |
|     3       |     4      | Pink        | This is an apple   |
|     4       |     2      | Yellow      | This is an apple   |
|     5       |     2      | Orange      | This is an apple   |
|     1       |     3      | Red         | This is a duplicate|
|     2       |     5      | Green       | This is a duplicate|
|     3       |     4      | Pink        | This is a duplicate|
|     4       |     2      | Yellow      | This is a duplicate|
|     5       |     2      | Orange      | This is a duplicate|
 -------------------------------------------------------------

有什么想法吗?

~~~~~~~~~~~~更新:~~~~~~~~~~~~

Jeff M建议使用以下代码。 (谢谢杰夫)然而,它给了我一个错误。有谁知道在VB中使这个工作的语法?我有点唠叨,似乎无法做到正确。

Dim matches = _
    From mRows In (From row In LinqMasterTable _
        Group row By row(ThePrimaryKey) Into g() _
        Select g.First()) _
    Join sRows In LinqSecondTable _
    On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
    Order By mRows(ThePrimaryKey) _
    Select mRows, sRows

“row(ThePrimaryKey)”第三行出错:

“范围变量名称只能从没有参数的简单或限定名称推断。”

3 个答案:

答案 0 :(得分:1)

嗯,基本问题不是LINQ。这是你的第一个表包含“重复”的事实,它们并不是真的重复,因为在你的例子中,每一行都是独特的。

所以,我们的问题是“我们如何识别原始表格中的重复项?”。一旦得到回答,剩下的应该是微不足道的。

例如(在C#中因为我不确定VB语法)

var Matches = from mRows in LinqMasterTable
                             .Where(r=>r.AppleDescription=="This is an Apple")
              join sRows in LinqSecondTable 
                   on mRows(ThePrimaryKey) equals sRows(TheForignKey)  
              orderby mRows(ThePrimaryKey) 
              select new { mRows, sRows};

答案 1 :(得分:0)

修改
这是我如何编写C#LINQ查询。这是一个替代版本,而不是使用Distinct(),使用嵌套查询和分组,它应具有相似的语义。它应该很容易转换为VB。

var matches = from mRows in (from row in LinqMasterTable
                             group row by row[ThePrimaryKey] into g
                             select g.First())
              join sRows in LinqSecondTable
                  on mRows[ThePrimaryKey] Equals sRows[TheForignKey]
              orderby mRows[ThePrimaryKey]
              select new { mRows, sRows }

我尝试上面的VB版本:

修改
至于最近的错误,我确切知道如何处理它。当我使用VB LINQ时,我发现编译器不喜欢复杂的分组表达式。要解决这个问题,请将row(ThePrimaryKey)分配给临时变量并按该变量分组。它应该工作。

Dim matches = From mRows In (From row In LinqMasterTable _
                             Let grouping = row(ThePrimaryKey)
                             Group row By grouping Into g() _
                             Select g.First()) _
              Join sRows In LinqSecondTable _
                  On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
              Order By mRows(ThePrimaryKey) _
              Select mRows, sRows

实际上,经过第二次检查,结果是分组的内容需要一个名字。以下内容可行。

Dim matches = From mRows In (From row In LinqMasterTable _
                             Group row By Grouping = row(ThePrimaryKey) Into g() _
                             Select g.First()) _
              Join sRows In LinqSecondTable _
                  On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
              Order By mRows(ThePrimaryKey) _
              Select mRows, sRows

答案 2 :(得分:0)

声明等:

Private Sub LinqTwoTableInnerJoin(ByRef qMasterDS As DataSet, _
                                  ByRef qMasterTable As DataTable, _
                                  ByRef qSecondDS As DataSet, _
                                  ByRef qSecondTable As DataTable, _
                                  ByRef qPrimaryKey As String, _
                                  ByRef qForignKey As String, _
                                  ByVal qResultsName As String)

Dim TheMasterTable As String = qMasterTable.TableName
Dim TheSecondTable As String = qSecondTable.TableName
Dim ThePrimaryKey As String = qPrimaryKey
Dim TheForignKey As String = qForignKey
Dim TheNewForignKey As String = ""

MasterTableColumns = GetColumns(qMasterDS, TheMasterTable)
SecondTableColumns = GetColumns(qSecondDS, TheSecondTable)

Dim mColumnCount As Integer = MasterTableColumns.Count
Dim sColumnCount As Integer = SecondTableColumns.Count

Dim ColumnCount As Integer = mColumnCount + sColumnCount

Dim LinqMasterTable = qMasterDS.Tables(TheMasterTable).AsEnumerable
Dim LinqSecondTable = qSecondDS.Tables(TheSecondTable).AsEnumerable

获取数据并按选定键进行排序:

Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
             On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
             Order By mRows(ThePrimaryKey) _
             Select mRows, sRows

将结果放入数据集表:

' Make sure the dataset is available and/or cleared:
If dsResults.Tables(qResultsName) Is Nothing Then dsResults.Tables.Add(qResultsName)
dsResults.Tables(qResultsName).Clear() : dsResults.Tables(qResultsName).Columns.Clear()

'Adds Master Table Column Names
For x = 0 To MasterTableColumns.Count - 1
    dsResults.Tables(qResultsName).Columns.Add(MasterTableColumns(x))
Next

'Rename Second Table Names if Needed:
For x = 0 To SecondTableColumns.Count - 1
    With dsResults.Tables(qResultsName)
        For y = 0 To .Columns.Count - 1
            If SecondTableColumns(x) = .Columns(y).ColumnName Then
                SecondTableColumns(x) = SecondTableColumns(x) & "_2"
            End If
        Next
    End With
Next

'Make sure that the Forign Key is a Unique Value
If ForignKey1 = PrimaryKey Then
    TheNewForignKey = ForignKey1 & "_2"
Else
    TheNewForignKey = ForignKey1
End If

'Adds Second Table Column Names
For x = 0 To SecondTableColumns.Count - 1 
    dsResults.Tables(qResultsName).Columns.Add(SecondTableColumns(x))
Next

'Copy Results into the Dataset:
For Each Match In Matches

    'Build an array for each row:
    Dim NewRow(ColumnCount - 1) As Object

    'Add the mRow Items:
    For x = 0 To MasterTableColumns.Count - 1
        NewRow(x) = Match.mRows.Item(x)
    Next

    'Add the srow Items:
    For x = 0 To SecondTableColumns.Count - 1
        Dim y As Integer = x + (MasterTableColumns.Count)
        NewRow(y) = Match.sRows.Item(x)
    Next

    'Add the array to dsResults as a Row:
    dsResults.Tables(qResultsName).Rows.Add(NewRow)

Next

为用户提供清除双打的选项:

If chkUnique.Checked = True Then
    ReMoveDuplicates(dsResults.Tables(qResultsName), ThePrimaryKey)
End If

如果他们愿意,请删除重复项:

Private Sub ReMoveDuplicates(ByRef SkipTable As DataTable, _
                         ByRef TableKey As String)

    'Make sure that there's data to work with:
    If SkipTable Is Nothing Then Exit Sub
    If TableKey Is Nothing Then Exit Sub

    'Create an ArrayList of rows to delete:
    Dim DeleteRows As New ArrayList()

    'Fill the Array with Row Number of the items equal 
    'to the item above them:
    For x = 1 To SkipTable.Rows.Count - 1
        Dim RowOne As DataRow = SkipTable.Rows(x - 1)
        Dim RowTwo As DataRow = SkipTable.Rows(x)
        If RowTwo.Item(TableKey) = RowOne.Item(TableKey) Then
            DeleteRows.Add(x)
        End If
    Next

    'If there are no hits, exit this sub:
    If DeleteRows.Count < 1 Or DeleteRows Is Nothing Then
        Exit Sub
    End If

    'Otherwise, remove the rows based on the row count value:
    For x = 0 To DeleteRows.Count - 1

        'Start at the END and count backwards so the duplicate 
        'item's row count value doesn't change with each deleted row
        Dim KillRow As Integer = DeleteRows((DeleteRows.Count - 1) - x)

        'Delete the row:
        SkipTable.Rows(KillRow).Delete()

    Next
End Sub

然后清理剩余物:

If Not chkRetainKeys.Checked = True Then 'Removes Forign Key
    dsResults.Tables(qResultsName).Columns.Remove(TheNewForignKey)
End If

'Clear Arrays
MasterTableColumns.Clear()
SecondTableColumns.Clear()

最终分析: 将这个包含4个列,65,535行和一些双打的2个文件。处理时间,大约1秒。实际上,将字段加载到内存中需要的时间比解析数据所花费的时间要长。