Question

csv文件：csv file

我有一个csv数据文件，其中包含州名和裁剪类型以及不同的值。我想在字典中创建一个字典，使输出看起来像

{'Corn': {'Illinois': ['93']}}
{'Soybeans': {'Illinois': ['94']}}

其中{＆＃39;裁剪类型＆＃39;：{＆＃39;州＆＃39;：[＆＃39; max_value＆＃39;]}}。

这是我目前的代码：

STATES = ['Alaska', 'Alabama', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']

def open_file():
    fp = open('alltablesGEcrops.csv', 'r')
    return fp

def read_file(fp):
    fp.readline()
    dict1 = {}
    dict2 = {}
    for line in fp:
        line_lst = line.strip().split(',')
        state = line_lst[0]
        crop = line_lst[1]
        variety = line_lst[3]
        year = int(line_lst[4])
        value = line_lst[6]
        if variety == 'All GE varieties' and state == 'Illinois':
            max_value = max(value, key=int)
            dict1.setdefault(state,[]).append(max_value)
            dict2 = {crop:dict1}
            print(dict2)

def main():
    fp = open_file()
    data = read_file(fp)
    print(data)

if __name__ == "__main__":
    main()

它的输出如下： code output

我想知道如何修复我的代码，以便我只能打印每种裁剪类型的最后一行？此外，当我找到最大值时，它总是打印出来

{'Soybeans': {'Illinois': ['7', '6', '2', '8', '3', '6', '5', '7', ...]}}

而不是

{'Soybeans': {'Illinois': ['94']}}

我该如何解决？

Answer 1

你可以在没有熊猫的情况下做到这一点，但你为什么要这样做？

import pandas as pd

# load dataframe
df = pd.read_csv('alltablesGEcrops.csv', na_values={"Value": ("*", ".")})

# produce results
print(df.groupby(['State', 'Crop'])['Value'].max())

给出了

State           Crop
Alabama         Upland cotton    98
Arkansas        Soybeans         99
                Upland cotton    99
California      Upland cotton     9
Georgia         Upland cotton    99
Illinois        Corn             93
                Soybeans         94
Indiana         Corn              9
                Soybeans         96
Iowa            Corn             95
                Soybeans         97
Kansas          Corn             95
                Soybeans         96
Louisiana       Upland cotton    99
Michigan        Corn             93
                Soybeans         95
Minnesota       Corn             93
                Soybeans         96
Mississippi     Soybeans         99
                Upland cotton    99
Missouri        Corn             93
                Soybeans         94
Missouri 2/     Upland cotton    99
Nebraska        Corn             96
                Soybeans         97
North Carolina  Upland cotton    98
North Dakota    Soybeans         98
North Dakota    Corn             97
Ohio            Corn              9
                Soybeans         91
Other States    Corn             91
                Soybeans         94
                Upland cotton    98
South Dakota    Corn             98
                Soybeans         98
Tennessee       Upland cotton    99
Texas           Upland cotton    93
Texas           Corn             91
U.S.            Corn             93
                Soybeans         94
                Upland cotton    96
Wisconsin       Corn             92
                Soybeans         95
Name: Value, dtype: object

Answer 2

您只需使用字典即可尝试：

 from collections import defaultdict

f = open('alltablesGEcrops.csv').readlines()

f = [i.strip('\n').split(',') for i in f]

d = defaultdict(dict)


for i in f[1:]:
    if i[0] in d[i[1]].keys():

        if i[-1] > max(d[i[1]][i[0]]):

            d[i[1]][i[0]] = [i[-1]]

     else:
         d[i[1]][i[0]] = [i[-1]]

print dict(d)

在字典中创建一个字典，并在python 3.x中找到最大值

2 个答案: