我正在寻找有关如何加快下面代码位的帮助,因为就目前而言,执行任务需要很长时间。任何建议将不胜感激。提前谢谢!
下面的代码位是实际版本的精简版,但所有重要的内容都应该存在。代码有效;但是,即使是适度大小的数据集,代码也很慢。不用说,主要罪魁祸首是第二个嵌套记录集/ SQL调用。 LIKE运算符是减速的一部分,但我更关心嵌套,我认为LIKE运算符将在我们尝试完成的任务中被要求。我尝试将第二个SQL调用嵌入到第一个SQL调用中,但我没有看到这样做的简洁方法。
平台:经典ASP,VBScript,MS Access DB
' Go through all people in the table.
sql1 = "SELECT ID, FN, LN, Email FROM Table1"
Call rst1.Open(sql1, cnx, 0, 1)
While Not rst1.EOF
id = rst1.Fields("ID").Value
fn = rst1.Fields("FN").Value
ln = rst1.Fields("LN").Value
email = rst1.Fields("Email").Value
If IsNull(email) Or IsEmpty(email) Then
email = ""
End If
' ----- Figure out if any other people in the table has a similar name or is using the same e-mail address.
' Capture both the ID of those other people as well as figure out the total number of possible duplicates.
sql2 = "SELECT ID FROM Table1"
sql2 = sql2 & " WHERE"
sql2 = sql2 & " ID <> " & id
sql2 = sql2 & " AND"
sql2 = sql2 & " ("
sql2 = sql2 & " FN & ' ' & LN LIKE '%" & Replace(fn & " " & ln, "'", "''") & "%'"
If email <> "" Then
sql2 = sql2 & " OR"
sql2 = sql2 & " Email LIKE '%" & Replace(email, "'", "''") & "%'"
End If
sql2 = sql2 & " )"
Call rst2.Open(sql2, cnx, 0, 1)
numDups = 0
possibleDups = ""
While Not rst2.EOF
numDups = numDups + 1
If possibleDups <> "" Then
possibleDups = possibleDups & ", "
End If
possibleDups = possibleDups & rst2.Fields("ID").Value
Call rst2.MoveNext()
Wend
Call rst2.Close()
' ----- End nest query.
Call Response.Write(fn & " " & ln & " has " & numDups & " possible duplicates (" & possibleDups & ")")
Call rst1.MoveNext()
Wend
Call rst1.Close()
更新1:
根据请求,这里有关于样本数据和预期输出的更多信息。 Table1基本上是一个包含字段的表:id,fn,ln,email。 id是表示条目的自动生成的ID,fn / ln分别表示该人的条目的名/姓。预期输出是编码的,例如......
John Doe has 3 possible duplicates (1342, 3652, 98325)
John Doe has 3 possible duplicates (986, 3652, 98325)
John Doe has 3 possible duplicates (986, 1342, 98325)
John Doe has 3 possible duplicates (986, 1342, 3652)
Sam Jones has 0 possible duplicates ()
Jane Smith has 2 possible duplicates (234, 10562)
Jane Smith has 2 possible duplicates (155, 10562)
Jane Smith has 2 possible duplicates (155, 234)
括号中的数字对应于似乎与每个人重复的ID。可能重复的情况是同一个表中的另一个条目似乎共享相同的名称或电子邮件。例如,表中可能只有4个John Doe和3个Jane Smith,仅基于名称。
理想情况下,只需要一个SQL查询来减少记录集调用引起的往返,但与常规SQL Server相比,Access的功能有限,而且我不确定我错过了什么帮助提高速度。
更新2:
使用@Abecee的SQL Fiddle,我能够获得更快的查询。但是,我现在遇到了两个问题。
大图仍然是一样的。我们正在根据名字,姓氏和电子邮件地址寻找可能的重复项。但是,我们还添加了一个搜索条件,即If searchstring <> "" Then ... End If
内包含的行。另请注意,电子邮件信息现在从名为EmailTable的单独表中提取,其中包含字段ID,IndividualID(表示Table1.id)和电子邮件。
Mods:更新的查询类似,但与上面的原始查询略有不同。我不确定创建一个全新的问题是否更好,所以我现在就把它留在这里。如果我将其转移到自己的问题,请告诉我。
如果与下面评论A关联的代码已取消注释sql1 = sql1 & " OR (INSTR(E1.Email, E2.Email) > 0) "
,则会收到一条错误消息:Microsoft JET Database Engine (0x80040E14) Join expression not supported.
查询似乎编码正确,因此缺少或不正确?
如果与下面的注释B关联的代码已取消注释sql1 = sql1 & " OR INSTR(E1.Email, '" & Replace(searchstring, "'", "''") & "') > 0"
,则查询会运行但会挂起。我尝试将查询直接放入Access以查看它是否有效(例如,New Query&gt; SQL View),但它也会从Access中挂起。我认为语法和逻辑是正确的,但显然有些东西是歪斜的。你看到这行代码会挂起什么或为什么会挂起来吗?
以下是更新的查询:
sql1 = sql1 & "SELECT "
sql1 = sql1 & " T1.ID, T1.FN, T1.LN, E1.Email, "
sql1 = sql1 & " T2.ID, T2.FN, T2.LN "
sql1 = sql1 & "FROM "
sql1 = sql1 & " ((Table1 T1 LEFT JOIN [SELECT E1.* FROM EmailTable E1 WHERE E1.Primary = True]. AS E1 ON T1.ID = E1.IndividualID)"
sql1 = sql1 & " LEFT JOIN (Table1 T2 LEFT JOIN EmailTable E2 ON T2.ID = E2.IndividualID) "
sql1 = sql1 & " ON "
sql1 = sql1 & " ("
sql1 = sql1 & " T1.ID <> T2.ID "
sql1 = sql1 & " AND "
sql1 = sql1 & " ("
sql1 = sql1 & " ((INSTR(T1.FN, T2.FN) > 0) AND (INSTR(T1.LN, T2.LN) > 0)) "
' A. When the following line is uncommented, error is "Join expression not supported."
' sql1 = sql1 & " OR (INSTR(E1.Email, E2.Email) > 0) "
sql1 = sql1 & " ) "
sql1 = sql1 & " ) "
sql1 = sql1 & " ) "
If searchstring <> "" Then
sql1 = sql1 & " WHERE "
sql1 = sql1 & " INSTR(T1.FN & ' ' & T1.LN, '" & Replace(searchstring, "'", "''") & "') > 0"
' B. When the following line is uncommented, code hangs on the rst1.open() call."
' sql1 = sql1 & " OR INSTR(E1.Email, '" & Replace(searchstring, "'", "''") & "') > 0"
End If
sql1 = sql1 & " ORDER BY T1.LN, T1.FN, T1.ID"
prevID = 0
Call rst1.Open(sql1, cnx, 0, 1)
While Not rst1.EOF
id = rst1.Fields("ID").Value
' Get initial values if we've come across a new ID.
If (id <> prevID) Then
fn = rst1.Fields("T1.FN").Value
ln = rst1.Fields("T1.LN").Value
email = rst1.Fields("Email").Value
If IsNull(email) Or IsEmpty(email) Then
email = ""
End If
' Reset the counter for how many possible duplicates there are.
numDups = 0
' If there is an ID from the second table, then keep track of this possible duplicate.
tmp = rst1.Fields("T2.ID").Value
If IsNumeric(tmp) Then
tmp = CLng(tmp)
Else
tmp = 0
End If
If tmp > 0 Then
numDups = numDups + 1
possibleDups = possibleDups & tmp
End If
End If
' Figure out if we should show this row. Within this logic, we'll also see if there is another possible duplicate.
showrow = False
Call rst1.MoveNext()
If rst1.EOF Then
' Already at the end of the recordset so show this row.
showrow = True
Call rst1.MovePrevious()
Else
If rst1.Fields("T1.ID") <> lngIndividualIDCurrent Then
' Next record is different T1, so show this row.
showrow = True
Call rst1.MovePrevious()
Else
' Next record is the same T1, so don't show this row but note the duplicate.
Call rst1.MovePrevious()
' Also, add the new T2 as a possible duplicate.
tmp = rst1.Fields("T2.ID").Value
If IsNumeric(tmp) Then
tmp = CLng(tmp)
Else
tmp = 0
End If
If tmp > 0 Then
numDups = numDups + 1
If possibleDups <> "" Then
possibleDups = possibleDups & ", "
End If
possibleDups = possibleDups & tmp
End If
End If
End If
If showrow Then
Call Response.Write(fn & " " & ln & " has " & numDups & " possible duplicates (" & possibleDups & ")")
End If
Call rst1.MoveNext()
prevID = id
Wend
Call rst1.Close()
答案 0 :(得分:3)
是的,因为LIKE '%whatever%'
不是sargable,所以会很慢。因此,如果[Table1]有1,000行,那么最好,您将检索表中每行的其他999行,这意味着您将总共拉出999,000行。
一些观察结果:
您正在对表中的每一行与每隔一行进行比较。这可能是您可能想要仅一次来查找旧数据中可能的重复项,但作为应用程序正常运行的一部分,我们希望比较 one 记录所有其他记录(即您要插入或更新的一条记录)。
您正在寻找行WHERE 'fn1 ln1' LIKE('%fn2 ln2%')
。这与WHERE fn1=fn2 AND ln1=ln2
有何不同? 那是可以攻击的,所以如果你在[FN]和[LN]上有索引,那么这可以加快速度。
您真的不应该使用Access数据库作为Web应用程序的后端(参考:here)。