我最终试图比较来自两个.csv文件的数据,只查找"只有table1中已更改的数据行"。
我想使用LINQ Query来执行此操作。我使用VB.NET和OleDbDataAdapter用.csv数据填充两个DataTables。
每个表中的列数始终匹配,但不一定是行数。我不知道列名,但我会知道主键列索引。胃内table1.Field(Of String)(4)。我在示例中保持了列数不足,但请记住,我的.csv文件中的列数会有所不同,并且可能会有50多列。
表1
"John", "Adams", "51 Orange St", "Mechanic", "ID0004", "45.00", "1987"
"Nancy", "Wilson", "77 Westy Park", "HR", "ID0029", "27.00", "1991"
表2
"John", "Adams", "51 Orange St", "Mechanic", "ID0004", "45.00", "1987"
"Nancy", "Wilson", "227 Groove Ln", "HR", "ID0029", "27.00", "1991"
"Pat", "Rita", "51 Orange St", "Mechanic", "ID0017", "21.00", "1987"
预期结果:
我们在.Field(Of String)(4)上的两个表之间有两个匹配项,这是我们的Key列。但是,在这种情况下,我们只想返回一行。 " Nancy"," Wilson"," 77 Westy Park"," HR"," ID0029",&#34 ; 27.00"," 1991"作为该行中的一个列数据已更改。
将table2视为不会更改的主表。我们只关心返回table1中具有table2中匹配键的行,并且只有在任何数据发生更改时才会返回。谢谢!
答案 0 :(得分:0)
如果您在内存中执行此操作,并且希望进行此优化,我不会认为LINQ查询会为您提供最佳结果。我会将其中一个表放入一个字典中,主键指向DataRows,这样您就可以在循环浏览另一个表时快速查找匹配的行。 但是,如果您使用LINQ查询进行此操作,请参阅以下示例代码:
Sub Main()
Dim t1 As New DataTable
Dim t2 As New DataTable
t1.PrimaryKey = {t1.Columns.Add(), t1.Columns.Add()}
For c As Integer = 1 To 5 : t1.Columns.Add() : Next
t2.PrimaryKey = {t2.Columns.Add(), t2.Columns.Add()}
For c As Integer = 1 To 5 : t2.Columns.Add() : Next
t1.Rows.Add("John", "Adams", "51 Orange St", "Mechanic", "ID0004", 45.0, 1987)
t2.Rows.Add("John", "Adams", "51 Orange St", "Mechanic", "ID0004", 45.0, 1987)
t1.Rows.Add("Nancy", "Wilson", "77 Westy Park", "HR", "ID0029", 27.0, 1991)
t2.Rows.Add("Nancy", "Wilson", "227 Groove Ln", "HR", "ID0029", 27.0, 1991)
t2.Rows.Add("Pat", "Rita", "51 Orange St", "Mechanic", "ID0017", 21.0, 1987)
Dim diffs = _
From row2 In t2 Group Join row1 In t1 _
On row1(t1.PrimaryKey(0)) Equals row2(t2.PrimaryKey(0)) _
And row1(t1.PrimaryKey(1)) Equals row2(t2.PrimaryKey(1)) _
Into Group
Where Not Group.Any OrElse RowsDifferent(Group.Single, row2)
For Each diff In diffs
If diff.Group.Any Then
For Each col In diff.row2.ItemArray
Console.Write(col & ",")
Next
Console.WriteLine()
Else
For Each col In diff.row2.ItemArray
Console.Write(col & ",")
Next
Console.WriteLine()
End If
Next
End Sub
Private Function RowsDifferent(r1 As DataRow, r2 As DataRow) As Boolean
For i As Integer = 0 To r1.Table.Columns.Count - 1
If Not r1(i).Equals(r2(i)) Then Return True
Next
Return False
End Function
看起来,由于.NET Framework 4.0中的增强功能,您甚至可以使用内置函数而不是定义自己的RowsDifferent
函数:
Dim diffs = _
From row2 In t2 Group Join row1 In t1 _
On row1(t1.PrimaryKey(0)) Equals row2(t2.PrimaryKey(0)) _
And row1(t1.PrimaryKey(1)) Equals row2(t2.PrimaryKey(1)) _
Into Group Where Not Group.Any OrElse _
Not Group.Single.ItemArray.SequenceEqual(row2.ItemArray)
答案 1 :(得分:0)
这是一个不使用LINQ Query的替代解决方案,它给出了预期的结果。但是,我不确定它的可靠性。
Dim lstTable2 As List(Of String) = File.ReadAllLines("C:\Table2.txt").ToList
Dim lstTable1 As List(Of String) = File.ReadAllLines("C:\Table1.txt").ToList
Dim lstChanges As List(Of String) = New List(Of String)(lstTable1.Except(lstTable2))
File.WriteAllLines("C:\Changes.txt", lstChanges.ToArray())