提高搜索效率的方法

时间:2016-03-28 15:22:43

标签: python sql performance search big-o

我需要一些反馈。对于我的一个项目,我正在创建六度维基百科。为了保持简短,我完成了所有数据清理并将其插入到MSSQL的表中。到目前为止,一切正常。我能够搜索从起点到终点的连接,直到第三个程度,然后它只需要太长时间来处理。正在寻找可以改变我的代码以提高效率的方法。我是相当新的,这是我的第一次,所以可能不是最好的方式(虽然我知道这可能是我能做到的最糟糕的方式)。

任何反馈都将不胜感激。

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""

import pyodbc
import time
#import re

#rex = re.compile('(\(\'[a-zA-Z0-9]+\', \')(\w\\))')
start_time = time.time()


listinit = []
listseconditeration = []
listthirditeration = []
listfourthiteration = []
listfifthiteration = []
listsixthiteration = []

start = input ("Select start location :")
finish = input ("Select finish location :")
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=johndoe-PC\SQLEXPRESS;DATABASE=master;UID=-----;PWD=------;Trusted_Connection=yes')
cursor = cnxn.cursor()
cursor.execute("select * from join_table where link1 like '%s'" % (start))

rows = cursor.fetchall()
for row in rows:
    listinit.append(row)

for element in listinit:
    var1 = str(element)
    var1 = var1.replace("'","")
    var1 = var1.replace("(","")
    var1 = var1.replace(")","")
    var1 = var1.replace(",","")
    var1 = var1.replace(" ","")
    var1 = var1.replace(start,"")
    listseconditeration.append(var1)


if (finish) in (listseconditeration):
    print("one degree away")
    print("%s minutes" % (time.time() - start_time))


for element in listseconditeration:
    var2 = str(element)

    cursor.execute("select * from join_table where link1 like '%s'" % (var2))
    rows1 = cursor.fetchall()

    for row in rows1:

        listthirditeration.append(row)

        for element in listthirditeration:
            var3 = str(element)
            var3 = var3.replace("'","")
            var3 = var3.replace("(","")
            var3 = var3.replace(")","")
            var3 = var3.replace(",","")
            var3 = var3.replace(" ","")
            var3 = var3.replace(var2, "")
            listfourthiteration.append(var3)



if (finish) in (listfourthiteration):
    print("two degree away")
    print("%s minutes" % (time.time() - start_time))



for element in listfourthiteration:
    var4 = str(element)

    cursor.execute("select * from join_table where link1 like '%s'" % (var4))
    rows2 = cursor.fetchall()

    for row in rows2:

        listfifthiteration.append(row)

        for element in listfifthiteration:
            var5 = str(element)
            var5 = var5.replace("'","")
            var5 = var5.replace("(","")
            var5 = var5.replace(")","")
            var5 = var5.replace(",","")
            var5 = var5.replace(" ","")
            var5 = var5.replace(var4, "")
            listsixthiteration.append(var5)
        print(row)


if (finish) in (listsixthiteration):
    print("three degree away")

1 个答案:

答案 0 :(得分:0)

代码有几个问题。

第一个也是最重要的问题是代码不会阻止重新处理已经访问过的元素。例如,当你做chicago->秋季时,你的日常工作中没有任何东西可以防止回到芝加哥并制作一个循环。这使得搜索的大小变得非常快。

一种解决方案是维护一组而不是已经访问过的元素:

seen = set()

...

    for element in listthirditeration:
        var3 = str(element)
        var3 = var3.replace("'","")
        var3 = var3.replace("(","")
        var3 = var3.replace(")","")
        var3 = var3.replace(",","")
        var3 = var3.replace(" ","")
        var3 = var3.replace(var2, "")

        if var3 not in visited:
            visited.add(var3)
            listfourthiteration.append(var3)

...重复其他类似的循环...

第二个问题是代码会复制元素列表的副本,并使它们比需要的时间长。

cursor.execute("select * from join_table where link1 like '%s'" % (var2))

rows1 = cursor.fetchall()           # <<== fetchall() returns a list of results

for row in rows1:

    listthirditeration.append(row)  # <<== this makes a second copy of the results

在进行像这样的广度优先搜索时使用的技巧是进行两次搜索 - 一次从开始到结束,另一次从结束到开始,并让它们在中间相遇。