通过空间在文件里面的Python分裂字符串

时间:2017-09-23 00:15:52

标签: python list binary

我正在尝试二元搜索。我希望它将值拆分为:

  

751 755 762 763 774 777 785 797 798 809 814 817 822 824 827 841 847   866 881 891 903 904 908 913 918 919 925 933 940 948 949 968 972 981   988 992 995 1010 1012 1016 1018 1024 1026 1040 1051 1070 1072 1075   1082 1087 1088 1090 1098 1099 1114 1126 1135 1141 1144 1152 1153 1156   1164 1174 1177 1179 1180 1186 1192 1202 1204 1207 1218 1224 1235 1249   1251 1253 1272 1289 1290 1301 1302 1315 1322 ......(再增加15K)

这样我就可以使用以下代码搜索它:

def binarySearch(newdata,number):


    i = 0
    lower = 0
    upper = len(newdata)






    while lower < upper:   
        x = lower + (upper - lower) // 2
        val = newdata[x]
        if number == val:
            return x
        elif number > val:
            if lower == x:   
                break        
            lower = x
        elif number < val:
            upper = x
    return None

如果未对二进制文件进行排序,请使用以下方法对其进行排序:

#SORT
def sorrt(data):
    result = []
    if len(data) < 2:
        return data
    mid = int(len(data)/2)
    y = sorrt(data[:mid])
    z = sorrt(data[mid:])
    while (len(y)>0) or (len(z)>0):
        if (len(y)>0) and (len(z)>0):
            if y[0] > z[0]:
                result.append(z[0])
                z.pop(0)
            else:
                result.append(y[0])
                y.pop(0)
        elif len(z)>0:
            for i in z:
                result.append(i)
                z.pop(0)
        else:
            for i in y:
                result.append(i)
                y.pop(0)

    return result

所以我的整个代码是:

#BINARY SEARCH
fileName = 'sorted15000.txt'






#SORT
def sorrt(data):
    result = []
    if len(data) < 2:
        return data
    mid = int(len(data)/2)
    y = sorrt(data[:mid])
    z = sorrt(data[mid:])
    while (len(y)>0) or (len(z)>0):
        if (len(y)>0) and (len(z)>0):
            if y[0] > z[0]:
                result.append(z[0])
                z.pop(0)
            else:
                result.append(y[0])
                y.pop(0)
        elif len(z)>0:
            for i in z:
                result.append(i)
                z.pop(0)
        else:
            for i in y:
                result.append(i)
                y.pop(0)

    return result



def binarySearch(newdata,number):


    i = 0
    lower = 0
    upper = len(newdata)





#ACTIVATE
    while lower < upper:   
        x = lower + (upper - lower) // 2
        val = newdata[x]
        if number == val:
            return x
        elif number > val:
            if lower == x:   
                break        
            lower = x
        elif number < val:
            upper = x
    return None


start =0
with open(fileName) as file:  
    data = file.read().split()
start_time = time.clock()


    number = raw_input("What  number?: ")
    start_time  

    newdata = sorrt(data)


    pos = binarySearch(newdata,number)
    print pos
    print "\nTime: "
    print time.clock() - start_time, "seconds"

我想确定我在变量number中搜索的二进制代码的位置。但我得到的是一个如此遥远的位置,例如755返回7565.这样的东西。是什么原因引起了这个问题?我确定我在这里正确实施了.split()

1 个答案:

答案 0 :(得分:0)

您的算法是正确的,但它们是在字符串上运行,而您需要整数。使用字符串时,最终会得到'123' < '5'。所以修改你的输入:

...
data = [int(x) for x in data]
number = int(number)
newdata = sorrt(data)
...