Question

我有一个.csv文件，在Excel中打开时看起来像这样： enter image description here

我的代码：

myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")

    countries = []
    for item in myfile:
        a = item.split(",")
        countries.append(a)

    hdi_list = []
    for acountry in countries:
        hdi = acountry[3]

        try:
            hdi_list.append(float(hdi))
        except:
            pass

    average = round(sum(hdi_list)/len(hdi_list), 2)
    maxNumber = round(max(hdi_list), 2)
    minNumber = round(min(hdi_list), 2)

这段代码效果很好，但是，当我找到max，min或avg时，我需要获取相应的国家名称并打印出来。

如何更改代码以获取min，max，avg的国家/地区名称？

Answer 1

不要将值直接放在列表中，而是使用元组，如下所示：

hdi_list.append((float(hdi), acountry[1]))

然后你可以改用它：

maxTuple = max(hdi_list)
maxNumber = round(maxTuple[0], 2)
maxCountry = maxTuple[1]

Answer 2

使用pandas模块，下面的[4]，[5]和[6]应分别显示最大值，分钟数和平均值。请注意，以下数据与您的国家/地区不相符。

In [1]: import pandas as pd

In [2]: df = pd.read_csv("hdi.csv")

In [3]: df
Out[3]: 
         Country    HDI
0         Norway  83.27
1      Australia  80.77
2    Netherlands  87.00
3  United States  87.43
4    New Zealand  87.43
5         Canada  87.66
6        Ireland  75.47
7  Liechtenstein  88.97
8        Germany  86.31
9         Sweden  80.54

In [4]: df.ix[df["HDI"].idxmax()]
Out[4]: 
Country    Liechtenstein
HDI                88.97
Name: 7, dtype: object

In [5]: df.ix[df["HDI"].idxmin()]
Out[5]: 
Country    Ireland
HDI          75.47
Name: 6, dtype: object

In [6]: df["HDI"].mean()
Out[6]: 84.484999999999985

假设Liechtenstein和Germany都有最大值：

In [15]: df
Out[15]: 
         Country    HDI
0         Norway  83.27
1      Australia  80.77
2    Netherlands  87.00
3  United States  87.43
4    New Zealand  87.43
5         Canada  87.66
6        Ireland  75.47
7  Liechtenstein  88.97
8        Germany  88.97
9         Sweden  80.54

In [16]: df[df["HDI"] == df["HDI"].max()]
Out[16]: 
         Country    HDI
7  Liechtenstein  88.97
8        Germany  88.97

可以对最小值应用相同的逻辑。

Answer 3

以下方法足够接近您的实现，我认为它可能有用。但是，如果您开始使用更大或更复杂的csv文件，则应查看“csv.reader”或“Pandas”等软件包（如前所述）。它们在处理复杂的.csv数据时更加强大和高效。您也可以使用“xlrd”包来处理Excel。

在我看来，使用各自值来引用国家/地区名称的最简单方法是将“for循环”组合起来。而不是循环遍历您的数据两次（在两个单独的'for循环'中）并创建两个单独的列表，使用单个'for循环'并创建具有相关数据的字典（即“国家名称”，“hdi”）。你也可以创建一个元组（如前所述），但我认为字典更明确。

myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")

countries = []
for line in myfile:
    country_name = line.split(",")[1]
    value_of_interest = float(line.split(",")[3])
    countries.append(
        {"Country Name": country_name, 
         "Value of Interest": value_of_interest})

ave_value = sum([country["Value of Interest"] for country in countries])/len(countries)
max_value = max([country["Value of Interest"] for country in countries])
min_value = min([country["Value of Interest"] for country in countries])

print "Country Average == ", ave_value
for country in countries:
    if country["Value of Interest"] == max_value:
        print "Max == {country}:{value}".format(country["Country Name"], country["Value of Interest"])
    if country["Value of Interest"] == min_value:
        print "Min == {country}:{value}".format(country["Country Name"], country["Value of Interest"])

请注意，如果此方法具有相等的最小/最大值，则会返回多个国家/地区。

如果您在创建单独的列表（例如当前实现）时已经死定，您可以考虑使用zip（）来连接列表（按索引），其中

zip(countries, hdi_list) = [(countries[1], hdi_list[1]), ...]

例如：

for country in zip(countries, hdi_list):
    if country[1] == max_value:
        print country[0], country[1]

将类似的逻辑应用于min和average。这种方法有效，但不太明确，难以维护。

在Python中查找.CSV文件中的最大数量

3 个答案: