这可能有点复杂,但我不确定如何解释它。
我有两个序列表:
T1:
+-------+----------+-------+-------+
| state | sequence | gie | match |
+-------+----------+-------+-------+
| a | 1 | fna | |
| c | 2 | fna | |
| b | 3 | fna | |
| d | 1 | dmc | |
| c | 2 | dmc | |
| c | 3 | dmc | |
+-------+----------+-------+-------+
T2:
+-------+----------+-------+-------+
| state | sequence | gie | match |
+-------+----------+-------+-------+
| a | 1 | fna | |
| d | 2 | fna | |
| c | 3 | fna | |
| b | 4 | fna | |
| d | 1 | dmc | |
| c | 2 | dmc | |
+-------+----------+-------+-------+
对于给定group
的每个序列,我想查找来自t2
的所有记录,这些记录不适合t1
中的序列,反之亦然。在t1
中,不匹配的记录是GIE dmc
,序列3
;在t2
中,不匹配的记录是GIE fna
,序列2
。
我无法弄清楚如何使用SQL来查找不匹配,因为我不清楚我应该加入什么。我用VBA尝试了如下:
'assumes both recordsets are ordered by GIE,sequence
Sub findNonMatch(rs_base As DAO.Recordset, rs_compare As DAO.Recordset)
rs_base.MoveFirst
rs_compare.MoveFirst
While Not rs_base.EOF
If rs_compare.EOF Then
updateRS rs_base, False
'separated into different if-clauses because checking rs_compare!GIE will throw error if rs_compare.eof
Else
If rs_compare!gie < rs_base!gie Then
While rs_compare!gie < rs_base!gie
rs_compare.MoveNext
Wend
End If
While (rs_compare!gie = rs_base!gie And rs_compare!state <> rs_base!state And (Not rs_compare.EOF))
rs_compare.MoveNext
Wend
If (rs_compare!state = rs_base!state And rs_compare!gie = rs_base!gie) Then
updateRS rs_base, True
rs_compare.MoveNext
End If
End If
rs_base.MoveNext
Wend
End Sub
Sub updateRS(rs As DAO.Recordset, status As Boolean)
rs.Edit
rs!Match = status
rs.update
End Sub
如果rs_compare
中的序列具有不在rs_base
中的额外值,但如果rs_compare
缺少rs_base
中的值,则此方法无效正常工作,因为该函数将尝试找到该值并转到rs_compare
序列的末尾,这意味着之后将找不到任何rs_base
值(因为rs_compare
游标现在已经超过了该序列中的所有记录。)
是否有更简单的方法可以找到这些序列差异?也许我没有想过一些SQL方法,特别是因为这个算法不能很好地扩展到更大的数据集?
答案 0 :(得分:1)
看看这种方式实现减去ms访问How can I implement SQL INTERSECT and MINUS operations in MS Access。
你要做的是在序列,状态和GIE上留下外连接t1到t2,并选择t2.id为空的所有行。
然后,您可以将该查询与第二个查询联合起来,该查询将外部联接t2保留为序列,状态和GIE上的t1,并选择t1.id为null的所有行。
答案 1 :(得分:1)
您所拥有的是用于比较两个字符串的最长公共子字符串问题。
检查VBA http://thydzik.com/longest-common-subsequence-implemented-in-vba-visual-basic-for-applications/
中的代码您应该以某种方式从Access获取string1和string2,并使用链接中的函数。 所以对于你的例子
String1 = acb
String2 = adcb
检查链接中的示例函数,以了解如何使用它。 'getDiff'函数的输出将是
=+==
所以区别在于2位。 +表示插入&#39; d&#39;在字符串1中使字符串相等。
答案 2 :(得分:1)
我自己模拟了一些测试数据,它似乎按预期运行,所以我想发布。我想让SQL做繁重的工作,但确实如此,但仍然需要运行一些代码。如果您只是想尝试一下,我已将accdb(Access 2007)文件放在此处:http://www.sendspace.com/file/eqm5vh。如果这样做,只需将数据输入t1和t2,然后打开Module1并运行RunSequences;子应该照顾其余的。
我的代码不像你的简洁,sigil,并且需要更多辅助对象。话虽如此,它可能比纯粹基于游标的解决方案更好地扩展,因为它只需要为每个表中的每个项目运行一行(或多或少,取决于表中有多少重复项,如果有的话)。我的想法是能够逐行地对每个表进行排序(比如SQL Server中的ROW_NUMBER),这样我就可以有一个绝对的位置来进行比较。我这样做是通过将两个表中的所有数据插入到具有自动编号字段的临时表中,然后执行旧的DCount技巧来获取行ID。其余的基于此数据集。我不会向死亡解释,而是让你尝试一下,看看它是否有效,但我会在下面发布我的代码,以防有人想要复习。
表格强>
<强>查询:强>
qryT1T2_Ordered_INSERT:
INSERT INTO tblTemp
SELECT *
FROM (SELECT "t1" AS SourceTable, t1.State, t1.Sequence, t1.GIE, t1.Match
FROM t1
UNION ALL
SELECT "t2" AS SourceTable, t2.State, t2.Sequence, t2.GIE, t2.Match
FROM t2) AS [%$##@_Alias]
ORDER BY SourceTable, GIE DESC , Sequence;
qryT1_Sequenced:
SELECT DCount("*","tblTemp","[SourceTable] = 't1' AND [ID] <= " & [ID]) AS SequenceID, tblTemp.ID, tblTemp.State, tblTemp.Sequence, tblTemp.GIE, tblTemp.Match, [State] & "_" & [GIE] AS JoinValue
FROM tblTemp
WHERE tblTemp.SourceTable="t1";
qryT1_Compare:
SELECT qryT1_Sequenced.SequenceID AS MySequenceID, qryT2_Sequenced.SequenceID AS OtherSequenceID, qryT1_Sequenced.ID AS MyID, qryT2_Sequenced.ID AS OtherID, qryT2_Sequenced.JoinValue
FROM qryT1_Sequenced LEFT JOIN qryT2_Sequenced ON qryT1_Sequenced.JoinValue = qryT2_Sequenced.JoinValue
ORDER BY qryT1_Sequenced.SequenceID, qryT2_Sequenced.ID;
qryT2_Sequenced:
SELECT DCount("*","tblTemp","[SourceTable] = 't2' AND [ID] <= " & [ID]) AS SequenceID, tblTemp.ID, tblTemp.State, tblTemp.Sequence, tblTemp.GIE, tblTemp.Match, [State] & "_" & [GIE] AS JoinValue
FROM tblTemp
WHERE tblTemp.SourceTable="t2";
qryT2_Compare:
SELECT qryT2_Sequenced.SequenceID AS MySequenceID, qryT1_Sequenced.SequenceID AS OtherSequenceID, qryT2_Sequenced.ID AS MyID, qryT1_Sequenced.ID AS OtherID, qryT2_Sequenced.JoinValue
FROM qryT2_Sequenced LEFT JOIN qryT1_Sequenced ON qryT2_Sequenced.JoinValue=qryT1_Sequenced.JoinValue
ORDER BY qryT2_Sequenced.SequenceID, qryT1_Sequenced.ID;
qryT1T2_Compared_FINAL:
SELECT tblTemp.SourceTable, tblTemp.State, tblTemp.Sequence, tblTemp.GIE, tblTemp.Match
FROM tblTemp
WHERE tblTemp.Match="No"
ORDER BY tblTemp.SourceTable, tblTemp.GIE DESC , tblTemp.Sequence;
<强>模块:强>
Public Sub RunSequences()
On Error GoTo ErrorHandler
DoCmd.SetWarnings False
Set db = CurrentDb()
'Do our setup:
'1. Clear our temp table.
CurrentDb.Execute "DELETE * FROM [tblTemp]"
'2. Insert data from t1 and t2 into temp table.
DoCmd.OpenQuery "qryT1T2_Ordered_INSERT"
'3. Now process the sequence.
ReportSequences "qryT1_Compare"
ReportSequences "qryT2_Compare"
'4. Open non-matched report.
DoCmd.OpenQuery "qryT1T2_Compared_FINAL"
ExitMe:
DoCmd.SetWarnings True
Exit Sub
ErrorHandler:
Debug.Print Err.Number & ": " & Err.Description
GoTo ExitMe
End Sub
'----
Public Sub ReportSequences(strSourceQuery As String)
On Error GoTo ErrorHandler
Dim db As DAO.Database
Dim rst As DAO.Recordset
Dim intLastOtherSequenceID As Integer
Dim dicMasterSequenceIDs As New Scripting.Dictionary
Dim dicComparedSequenceIDs As New Scripting.Dictionary
Dim strSQL_UpdateYes As String
Dim strSQL_UpdateNo As String
'Running all my updates inline, but you can break this out.
strSQL_UpdateYes = "UPDATE [tblTemp] SET [Match] = 'Yes' WHERE [ID] = @ID"
strSQL_UpdateNo = "UPDATE [tblTemp] SET [Match] = 'No' WHERE [ID] = @ID"
Set db = CurrentDb()
Set rst = db.OpenRecordset(strSourceQuery, dbOpenDynaset)
With rst
Do Until .EOF
'Need this to keep track of Master Sequence IDs (MyID) we've processed
'successfully.
'If there is more than one match for MyID, we want only to take the first
'match that fulfills the condition of being next in the sequence,
'not jump ahead.
If dicMasterSequenceIDs.Exists(.Fields("MyID").Value) = True Then
If dicMasterSequenceIDs(.Fields("MyID").Value) = "Done" Then
GoTo MoveNext
End If
Else
dicMasterSequenceIDs.Add .Fields("MyID").Value, ""
End If
Select Case IsNull(.Fields("OtherID"))
Case True
'If OtherID is null, it means no match in other table, so Match is
'automatically no.
db.Execute Replace(strSQL_UpdateNo, "@ID", .Fields("MyID"))
Case False
'Check to see if current OtherSequenceID is greater than the old
'one...
'(If it is, it is in sequence).
If intLastOtherSequenceID < CInt(.Fields("OtherSequenceID")) Then
'Use the dictionary to keep track of distinct OtherSequenceIDs we've already added.
If dicComparedSequenceIDs.Exists(.Fields("OtherSequenceID").Value) = False Then
dicComparedSequenceIDs.Add .Fields("OtherSequenceID").Value, ""
db.Execute Replace(strSQL_UpdateYes, "@ID", .Fields("MyID"))
dicMasterSequenceIDs(.Fields("MyID").Value) = "Done"
'If it's a dupe, means the sequence is broken.
Else
db.Execute Replace(strSQL_UpdateNo, "@ID", .Fields("MyID"))
End If
Else
'If the old one is equal or greater, means sequence is broken.
db.Execute Replace(strSQL_UpdateNo, "@ID", .Fields("MyID"))
End If
intLastOtherSequenceID = .Fields("OtherSequenceID")
End Select
MoveNext:
.MoveNext
Loop
End With
ExitMe:
Set dicComparedSequenceIDs = Nothing
Set rst = Nothing
Set db = Nothing
Exit Sub
ErrorHandler:
Debug.Print Err.Number & ": " & Err.Description
GoTo ExitMe
End Sub
无论如何,希望它适合你。如果没有,希望它可以给你更多的想法。
编辑:意识到子ReportSequences中的逻辑存在问题。在我们在另一个序列中得到几个匹配的情况下,我们想要只采用序列中最早的一个并且符合标准。添加了。在此新的accdb:http://www.sendspace.com/file/hcdxvp