如何使用SQL比较序列?

时间:2014-06-03 20:59:57

标签: sql vba ms-access

这可能有点复杂,但我不确定如何解释它。

我有两个序列表:

T1:

+-------+----------+-------+-------+
| state | sequence | gie   | match |
+-------+----------+-------+-------+
| a     |        1 | fna   |       |
| c     |        2 | fna   |       |
| b     |        3 | fna   |       |
| d     |        1 | dmc   |       |
| c     |        2 | dmc   |       |
| c     |        3 | dmc   |       |
+-------+----------+-------+-------+

T2:

+-------+----------+-------+-------+
| state | sequence | gie   | match |
+-------+----------+-------+-------+
| a     |        1 | fna   |       |
| d     |        2 | fna   |       |
| c     |        3 | fna   |       |
| b     |        4 | fna   |       |
| d     |        1 | dmc   |       |
| c     |        2 | dmc   |       |
+-------+----------+-------+-------+

对于给定group的每个序列,我想查找来自t2的所有记录,这些记录不适合t1中的序列,反之亦然。在t1中,不匹配的记录是GIE dmc,序列3;在t2中,不匹配的记录是GIE fna,序列2

我无法弄清楚如何使用SQL来查找不匹配,因为我不清楚我应该加入什么。我用VBA尝试了如下:

'assumes both recordsets are ordered by GIE,sequence
Sub findNonMatch(rs_base As DAO.Recordset, rs_compare As DAO.Recordset)

rs_base.MoveFirst
rs_compare.MoveFirst

While Not rs_base.EOF
    If rs_compare.EOF Then
        updateRS rs_base, False
    'separated into different if-clauses because checking rs_compare!GIE will throw error if rs_compare.eof
    Else
        If rs_compare!gie < rs_base!gie Then
        While rs_compare!gie < rs_base!gie
            rs_compare.MoveNext
        Wend
        End If

        While (rs_compare!gie = rs_base!gie And rs_compare!state <> rs_base!state And (Not rs_compare.EOF))
            rs_compare.MoveNext
        Wend
        If (rs_compare!state = rs_base!state And rs_compare!gie = rs_base!gie) Then
            updateRS rs_base, True
            rs_compare.MoveNext
        End If
    End If
    rs_base.MoveNext

Wend

End Sub

Sub updateRS(rs As DAO.Recordset, status As Boolean)
rs.Edit
rs!Match = status
rs.update
End Sub

如果rs_compare中的序列具有不在rs_base中的额外值,但如果rs_compare缺少rs_base中的值,则此方法无效正常工作,因为该函数将尝试找到该值并转到rs_compare序列的末尾,这意味着之后将找不到任何rs_base值(因为rs_compare游标现在已经超过了该序列中的所有记录。)

是否有更简单的方法可以找到这些序列差异?也许我没有想过一些SQL方法,特别是因为这个算法不能很好地扩展到更大的数据集?

3 个答案:

答案 0 :(得分:1)

看看这种方式实现减去ms访问How can I implement SQL INTERSECT and MINUS operations in MS Access

你要做的是在序列,状态和GIE上留下外连接t1到t2,并选择t2.id为空的所有行。

然后,您可以将该查询与第二个查询联合起来,该查询将外部联接t2保留为序列,状态和GIE上的t1,并选择t1.id为null的所有行。

答案 1 :(得分:1)

您所拥有的是用于比较两个字符串的最长公共子字符串问题。

检查VBA http://thydzik.com/longest-common-subsequence-implemented-in-vba-visual-basic-for-applications/

中的代码

您应该以某种方式从Access获取string1和string2,并使用链接中的函数。 所以对于你的例子

  String1 = acb
  String2 = adcb

检查链接中的示例函数,以了解如何使用它。 'getDiff'函数的输出将是

 =+==

所以区别在于2位。 +表示插入&#39; d&#39;在字符串1中使字符串相等。

答案 2 :(得分:1)

我自己模拟了一些测试数据,它似乎按预期运行,所以我想发布。我想让SQL做繁重的工作,但确实如此,但仍然需要运行一些代码。如果您只是想尝试一下,我已将accdb(Access 2007)文件放在此处:http://www.sendspace.com/file/eqm5vh。如果这样做,只需将数据输入t1和t2,然后打开Module1并运行RunSequences;子应该照顾其余的。

我的代码不像你的简洁,sigil,并且需要更多辅助对象。话虽如此,它可能比纯粹基于游标的解决方案更好地扩展,因为它只需要为每个表中的每个项目运行一行(或多或少,取决于表中有多少重复项,如果有的话)。我的想法是能够逐行地对每个表进行排序(比如SQL Server中的ROW_NUMBER),这样我就可以有一个绝对的位置来进行比较。我这样做是通过将两个表中的所有数据插入到具有自动编号字段的临时表中,然后执行旧的DCount技巧来获取行ID。其余的基于此数据集。我不会向死亡解释,而是让​​你尝试一下,看看它是否有效,但我会在下面发布我的代码,以防有人想要复习。

表格

tblTemp

<强>查询:

qryT1T2_Ordered_INSERT:

INSERT INTO tblTemp
SELECT *
FROM (SELECT "t1" AS SourceTable, t1.State, t1.Sequence, t1.GIE, t1.Match
FROM t1

UNION ALL

SELECT "t2" AS SourceTable, t2.State, t2.Sequence, t2.GIE, t2.Match
FROM t2)  AS [%$##@_Alias]
ORDER BY SourceTable, GIE DESC , Sequence;

qryT1_Sequenced:

SELECT DCount("*","tblTemp","[SourceTable] = 't1' AND [ID] <= " & [ID]) AS SequenceID, tblTemp.ID, tblTemp.State, tblTemp.Sequence, tblTemp.GIE, tblTemp.Match, [State] & "_" & [GIE] AS JoinValue
FROM tblTemp
WHERE tblTemp.SourceTable="t1";

qryT1_Compare:

SELECT qryT1_Sequenced.SequenceID AS MySequenceID, qryT2_Sequenced.SequenceID AS OtherSequenceID, qryT1_Sequenced.ID AS MyID, qryT2_Sequenced.ID AS OtherID, qryT2_Sequenced.JoinValue
FROM qryT1_Sequenced LEFT JOIN qryT2_Sequenced ON qryT1_Sequenced.JoinValue = qryT2_Sequenced.JoinValue
ORDER BY qryT1_Sequenced.SequenceID, qryT2_Sequenced.ID;

qryT2_Sequenced:

SELECT DCount("*","tblTemp","[SourceTable] = 't2' AND [ID] <= " & [ID]) AS SequenceID, tblTemp.ID, tblTemp.State, tblTemp.Sequence, tblTemp.GIE, tblTemp.Match, [State] & "_" & [GIE] AS JoinValue
FROM tblTemp
WHERE tblTemp.SourceTable="t2";

qryT2_Compare:

SELECT qryT2_Sequenced.SequenceID AS MySequenceID, qryT1_Sequenced.SequenceID AS OtherSequenceID, qryT2_Sequenced.ID AS MyID, qryT1_Sequenced.ID AS OtherID, qryT2_Sequenced.JoinValue
FROM qryT2_Sequenced LEFT JOIN qryT1_Sequenced ON qryT2_Sequenced.JoinValue=qryT1_Sequenced.JoinValue
ORDER BY qryT2_Sequenced.SequenceID, qryT1_Sequenced.ID;

qryT1T2_Compared_FINAL:

SELECT tblTemp.SourceTable, tblTemp.State, tblTemp.Sequence, tblTemp.GIE, tblTemp.Match
FROM tblTemp
WHERE tblTemp.Match="No"
ORDER BY tblTemp.SourceTable, tblTemp.GIE DESC , tblTemp.Sequence;

<强>模块:

Public Sub RunSequences()
On Error GoTo ErrorHandler

    DoCmd.SetWarnings False

    Set db = CurrentDb()

    'Do our setup:
    '1. Clear our temp table.
    CurrentDb.Execute "DELETE * FROM [tblTemp]"

    '2. Insert data from t1 and t2 into temp table.
    DoCmd.OpenQuery "qryT1T2_Ordered_INSERT"

    '3. Now process the sequence.
    ReportSequences "qryT1_Compare"
    ReportSequences "qryT2_Compare"

    '4. Open non-matched report.
    DoCmd.OpenQuery "qryT1T2_Compared_FINAL"

ExitMe:
    DoCmd.SetWarnings True

    Exit Sub
ErrorHandler:
    Debug.Print Err.Number & ": " & Err.Description
    GoTo ExitMe
End Sub

'----

Public Sub ReportSequences(strSourceQuery As String)
On Error GoTo ErrorHandler

    Dim db As DAO.Database
    Dim rst As DAO.Recordset
    Dim intLastOtherSequenceID As Integer
    Dim dicMasterSequenceIDs As New Scripting.Dictionary
    Dim dicComparedSequenceIDs As New Scripting.Dictionary
    Dim strSQL_UpdateYes As String
    Dim strSQL_UpdateNo As String

    'Running all my updates inline, but you can break this out.
    strSQL_UpdateYes = "UPDATE [tblTemp] SET [Match] = 'Yes' WHERE [ID] = @ID"
    strSQL_UpdateNo = "UPDATE [tblTemp] SET [Match] = 'No' WHERE [ID] = @ID"

    Set db = CurrentDb()
    Set rst = db.OpenRecordset(strSourceQuery, dbOpenDynaset)

    With rst
        Do Until .EOF
            'Need this to keep track of Master Sequence IDs (MyID) we've processed
            'successfully.
            'If there is more than one match for MyID, we want only to take the first
            'match that fulfills the condition of being next in the sequence,
            'not jump ahead.
            If dicMasterSequenceIDs.Exists(.Fields("MyID").Value) = True Then
                If dicMasterSequenceIDs(.Fields("MyID").Value) = "Done" Then
                    GoTo MoveNext
                End If
            Else
                dicMasterSequenceIDs.Add .Fields("MyID").Value, ""
            End If

            Select Case IsNull(.Fields("OtherID"))
                Case True
                    'If OtherID is null, it means no match in other table, so Match is
                    'automatically no.
                    db.Execute Replace(strSQL_UpdateNo, "@ID", .Fields("MyID"))
                Case False
                    'Check to see if current OtherSequenceID is greater than the old
                    'one...
                    '(If it is, it is in sequence).
                    If intLastOtherSequenceID < CInt(.Fields("OtherSequenceID")) Then
                        'Use the dictionary to keep track of distinct OtherSequenceIDs we've already added.
                        If dicComparedSequenceIDs.Exists(.Fields("OtherSequenceID").Value) = False Then
                            dicComparedSequenceIDs.Add .Fields("OtherSequenceID").Value, ""
                            db.Execute Replace(strSQL_UpdateYes, "@ID", .Fields("MyID"))
                            dicMasterSequenceIDs(.Fields("MyID").Value) = "Done"
                        'If it's a dupe, means the sequence is broken.
                        Else
                            db.Execute Replace(strSQL_UpdateNo, "@ID", .Fields("MyID"))
                        End If
                    Else
                        'If the old one is equal or greater, means sequence is broken.
                        db.Execute Replace(strSQL_UpdateNo, "@ID", .Fields("MyID"))
                    End If

                    intLastOtherSequenceID = .Fields("OtherSequenceID")
            End Select

MoveNext:
            .MoveNext
        Loop
    End With

ExitMe:
    Set dicComparedSequenceIDs = Nothing
    Set rst = Nothing
    Set db = Nothing

    Exit Sub
ErrorHandler:
    Debug.Print Err.Number & ": " & Err.Description
    GoTo ExitMe

End Sub

无论如何,希望它适合你。如果没有,希望它可以给你更多的想法。

编辑:意识到子ReportSequences中的逻辑存在问题。在我们在另一个序列中得到几个匹配的情况下,我们想要只采用序列中最早的一个并且符合标准。添加了。在此新的accdb:http://www.sendspace.com/file/hcdxvp