Question

我在一个单独的文件（countries.txt）中有一个国家/地区列表，我需要进行二进制搜索以查找国家/地区，并在其中说明其中提供的信息。

我的档案：

Afghanistan,    647500.0,   25500100

Albania,    28748.0,    2821977

Algeria,    2381740.0,  38700000

American Samoa, 199.0,  55519

Andorra,    468.0,  76246

Angola, 1246700.0,  20609294

如果我想找到阿尔巴尼亚的区域和人口，并且我将getCountry(Albania)放入shell中，我将如何说明所提供的信息？

到目前为止，我有这个......

def getCountry(key):

    start = "%s" #index
    end = len("%s")-1 #index
    while start<=end:
        mid = (start + end) / 2
        if '%s'[mid] == key: #found it!
            return True
        elif "%s"[mid] > key:
            end = mid -1
        else:
            start = mid + 1
    #end < start 
    return False

Answer 1

我会用字典：

def get_countries(filename):
    with open(filename) as f:
        country_iter = (line.strip().split(',') for line in f)
        return {
            country: {"area": area, "population": population}
            for country, area, population in country_iter
        }

if __name__ == '__main__':
    d = get_countries("countries.csv")
    print(d)

如果您确实对二进制搜索进行了设置，那么它看起来更像是：

def get_countries(filename):
    with open(filename) as f:
        return [line.strip().split(',') for line in f]

def get_country_by_name(countries, name):
    lo, hi = 0, len(countries) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        country = countries[mid]
        test_name = country[0]
        if name > test_name:
            lo = mid + 1
        elif name < test_name:
            hi = mid - 1
        else:
            return country
    return countries[lo] if countries[lo][0] == name else None

if __name__ == '__main__':
    a = get_countries("countries.csv")
    print(a)
    c = get_country_by_name(a, "Albania")
    print(c)

但是这正在编写一个二元搜索。如果您不必编写二进制搜索代码并且可以使用库例程，则它看起来像这样：

from bisect import bisect_left

def get_country_by_name(countries, name):
    country_names = [country[0] for country in countries]
    i = bisect_left(country_names, name)
    return countries[i]

Answer 2

逐步解决此问题。

从排序列表开始，并在函数列表中实现二进制搜索。
确保它适用于空列表，一个项目列表等
编写一个函数来取一个未排序的列表，对它进行排序并从第一个函数返回结果。
编写一个函数，该函数采用元组列表，其中字符串作为键，其他字符串作为数据。它应该对您的密钥上的数据进行排序，并返回您想要的内容。
编写一个函数，该函数读取文件并构造与4兼容的数据并返回所选项目。

在可消化的步骤中用自己的方式来解决更复杂的问题。

注意：这显然是学习如何实现算法的任务。如果真的要从文件中找到信息，那么使用字典就会出错。正确的做法是读取每一行，直到发现该国家平均对文件中的一半条目进行单一比较。没有浪费存储，没有浪费时间比较或散列。

Answer 3

正如Ashwini在评论中建议的那样，你可以在python中使用字典。它看起来像这样：

countries = {'Afghanistan': (647500.0, 25500100),

    'Albania': (28748.0, 2821977),

    'Algeria': (2381740.0, 38700000),

    'American Samoa': (199.0, 55519),

    'Andorra': (468.0, 76246),

    'Angola': (1246700.0, 20609294)}

print countries['Angola'][0]

您可以从this python documentation

了解有关dictionary和tuple的详情

Answer 4

另一个答案是正确的应该使用字典但是因为我猜这是一个作业，你需要的第一件事就是一个列表

with open("countries.txt") as f:
     #filter(none,a_list) will remove all falsey values (empty strings/lists/etc)
     #map(some_function,a_list) will apply a function to all elements in a list and return the results as a new list
     #in this case the iterable we are handing in as a_list is an open file handle and we are spliting each line on ","
     country_list = filter(None,map(lambda x:x.split(","),f))

然后你只需要像任何其他二进制搜索一样搜索你的有序列表

为了进行二进制搜索，你会做一些像（递归版）

的事情

def bin_search(a_sorted_list,target):
    mid_pt = len(a_sorted_list) // 2
    if target < a_sorted_list[mid_pt]:
        return bin_search(a_sorted_list[:mid_pt], target)
    elif target > a_sorted_list[mid_pt]:
        return bin_search(a_sorted_list[mid_pt:], target)
    elif target == a_sorted_list[mid_pt]:
        return mid_pt

在您的情况下，您将需要一些小修改

二进制搜索名称

4 个答案: