csv文件:csv file
我有一个csv数据文件,其中包含州名和裁剪类型以及不同的值。我想在字典中创建一个字典,使输出看起来像
{'Corn': {'Illinois': ['93']}}
{'Soybeans': {'Illinois': ['94']}}
其中{'裁剪类型':{'州':[' max_value']}}。
这是我目前的代码:
STATES = ['Alaska', 'Alabama', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']
def open_file():
fp = open('alltablesGEcrops.csv', 'r')
return fp
def read_file(fp):
fp.readline()
dict1 = {}
dict2 = {}
for line in fp:
line_lst = line.strip().split(',')
state = line_lst[0]
crop = line_lst[1]
variety = line_lst[3]
year = int(line_lst[4])
value = line_lst[6]
if variety == 'All GE varieties' and state == 'Illinois':
max_value = max(value, key=int)
dict1.setdefault(state,[]).append(max_value)
dict2 = {crop:dict1}
print(dict2)
def main():
fp = open_file()
data = read_file(fp)
print(data)
if __name__ == "__main__":
main()
它的输出如下: code output
我想知道如何修复我的代码,以便我只能打印每种裁剪类型的最后一行?此外,当我找到最大值时,它总是打印出来
{'Soybeans': {'Illinois': ['7', '6', '2', '8', '3', '6', '5', '7', ...]}}
而不是
{'Soybeans': {'Illinois': ['94']}}
我该如何解决?
答案 0 :(得分:1)
你可以在没有熊猫的情况下做到这一点,但你为什么要这样做?
import pandas as pd
# load dataframe
df = pd.read_csv('alltablesGEcrops.csv', na_values={"Value": ("*", ".")})
# produce results
print(df.groupby(['State', 'Crop'])['Value'].max())
给出了
State Crop
Alabama Upland cotton 98
Arkansas Soybeans 99
Upland cotton 99
California Upland cotton 9
Georgia Upland cotton 99
Illinois Corn 93
Soybeans 94
Indiana Corn 9
Soybeans 96
Iowa Corn 95
Soybeans 97
Kansas Corn 95
Soybeans 96
Louisiana Upland cotton 99
Michigan Corn 93
Soybeans 95
Minnesota Corn 93
Soybeans 96
Mississippi Soybeans 99
Upland cotton 99
Missouri Corn 93
Soybeans 94
Missouri 2/ Upland cotton 99
Nebraska Corn 96
Soybeans 97
North Carolina Upland cotton 98
North Dakota Soybeans 98
North Dakota Corn 97
Ohio Corn 9
Soybeans 91
Other States Corn 91
Soybeans 94
Upland cotton 98
South Dakota Corn 98
Soybeans 98
Tennessee Upland cotton 99
Texas Upland cotton 93
Texas Corn 91
U.S. Corn 93
Soybeans 94
Upland cotton 96
Wisconsin Corn 92
Soybeans 95
Name: Value, dtype: object
答案 1 :(得分:0)
您只需使用字典即可尝试:
from collections import defaultdict
f = open('alltablesGEcrops.csv').readlines()
f = [i.strip('\n').split(',') for i in f]
d = defaultdict(dict)
for i in f[1:]:
if i[0] in d[i[1]].keys():
if i[-1] > max(d[i[1]][i[0]]):
d[i[1]][i[0]] = [i[-1]]
else:
d[i[1]][i[0]] = [i[-1]]
print dict(d)