Question

我有一个家庭作业问题：

编写一个名为readCountries的函数，它读取文件并返回一个国家名单。应该从这个文件中读取这些国家/地区（countries.txt），其中包含不完整的国家/地区列表他们的面积和人口。此文件中的每一行代表一行国家/地区的格式如下：
name, area(in km2), population
打开文件时，您的函数应该处理任何异常可能导致。你的函数应该完全读入文件，并且将数据分成二维列表。你可能需要拆分和根据需要剥离数据。数字应该转换为他们的正确的类型。您的函数应该返回此列表，以便您可以在剩下的问题中使用它。

我有一个名为“countries.txt”的文本文件，其中包含一系列国家/地区，区域和人口的列表。

“countries.txt”的示例：

Afghanistan,    647500.0,   25500100
Albania,    28748.0,    2821977
Algeria,    2381740.0,  38700000

这是我的代码并且它有效：

def readCountries(filename):
    '''read a file and print it to the screen'''
    countryList = []
    for line in open(filename):
        with open(filename) as aFile:
            countries = aFile.read()
            countryList.append(line.strip().split())
    aFile.close()

    return countryList

运行问题时的输出示例：

>>> countryList = readCountries("countries.txt")
>>> countryList
[['Afghanistan,', '647500.0,', '25500100'], ['Albania,', '28748.0,', '2821977'], ['Algeria,', '2381740.0,', '38700000']

接下来的问题是：

编写一个名为printCountry的函数，它接受一个字符串表示国家/地区名称作为参数。首先从问题1中回答你的答案获取国家/地区列表，然后通过列表进行二进制搜索如果找到，则打印国家/地区的信息。并打印出来：
printCountry("Canada")
  Canada, Area: 9976140.0,    Population: 35295770
printCountry("Winterfell")
  I'm sorry, could not find Winterfell in the country list.

但我无法理解。

当我尝试对此问题进行编码时，我输入了：

countryList = readCountries("countries.txt")  
def printCountry(name):
    lo, hi = 0, len(countryList) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        country = countryList[mid]
        test_name = country[0]
        if name > test_name:
            lo = mid + 1
        elif name < test_name:
            hi = mid - 1
        else:
            return country[0] + ", Area: " + str(country[1]) + ",    Population: " + str(country[2])
    return "I'm sorry can not find " + str(name)

结果是：

>>> printCountry("Canada")
'Sorry can not find Canada'

即使加拿大在案文中。我哪里出错了？

Answer 1

您的二进制搜索代码（大部分）都可以，但您的代码中有几个问题会在国家/地区列表中显示。

您的文件打开＆amp;读代码很奇怪。它就像你已经结合了两种不同的方法来读取数据，所以你要多次打开文件。

幸运的是，这些线的影响：

with open(filename) as aFile:
    countries = aFile.read()

不会影响readCountries功能的输出，因为您不会对countries执行任何其他操作。

此外，在您的作业说明中，它表示＆＃34;根据需要删除数据。数字应该转换为正确的类型＆＃34;，您的代码不会这样做。正如我上面提示的暗示，这意味着列表中的国家/地区名称仍然附有逗号，因此二进制搜索无法找到它们（除非您在搜索名称中包含逗号）。

无论如何，这是一个经过清理的版本，旨在在Python 2.6或更高版本上运行。

from __future__ import print_function

def readCountries(filename):
    countryList = []
    with open(filename) as aFile:
        for line in aFile:
            line = line.strip().split()
            #Remove anny trailing commas from each field
            line = [s.rstrip(',') for s in line]
            #Convert area to float and population to int
            line = [line[0], float(line[1]), int(line[2])]
            #print line
            countryList.append(line)
    return countryList

countryList = readCountries("countries.txt")

def printCountry(name):
    lo, hi = 0, len(countryList) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        country = countryList[mid]
        test_name = country[0]
        if name > test_name:
            lo = mid + 1
        elif name < test_name:
            hi = mid - 1
        else:
            print('  {0}, Area: {1}, Population: {2}'.format(*country))
            break
    else:
        print("  I'm sorry, could not find {0} in the country list.".format(name))

#tests
printCountry("Canada")
printCountry("Winterfell")

print('- ' * 20)

#make sure we can find the first & last countries.
printCountry("Afghanistan")
printCountry("Nowhere")

这是我运行的数据文件：

<强> countries.txt

Afghanistan,    647500.0,   25500100
Albania,    28748.0,    2821977
Algeria,    2381740.0,  38700000
Canada,     9976140.0,  35295770
Nowhere,    1000.0      2345678

这是它产生的输出：

  Canada, Area: 9976140.0, Population: 35295770
  I'm sorry, could not find Winterfell in the country list.
- - - - - - - - - - - - - - - - - - - - 
  Afghanistan, Area: 647500.0, Population: 25500100
  Nowhere, Area: 1000.0, Population: 2345678

Answer 2

确保列表在二进制搜索之前已经排序：

countryList.sort()

在python

2 个答案: