首先,我想首先说我不要求您编写代码。我只想讨论并反馈编写这个程序的最佳方法,因为我一直在想弄清楚如何解决这个问题。
我的程序应该打开一个包含7列的CSV文件:
Name of the state,Crop,Crop title,Variety,Year,Unit,Value.
以下是文件的一部分:
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2012,Percent of all corn planted,60
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2013,Percent of all corn planted,73
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2014,Percent of all corn planted,78
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2015,Percent of all corn planted,76
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2016,Percent of all corn planted,75
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2000,Percent of all corn planted,11
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2001,Percent of all corn planted,12
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2002,Percent of all corn planted,13
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2003,Percent of all corn planted,16
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2004,Percent of all corn planted,21
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2005,Percent of all corn planted,26
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2006,Percent of all corn planted,40
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2007,Percent of all corn planted,59
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2008,Percent of all corn planted,78
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2009,Percent of all corn planted,79
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2010,Percent of all corn planted,83
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2011,Percent of all corn planted,85
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2012,Percent of all corn planted,84
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2013,Percent of all corn planted,85
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2014,Percent of all corn planted,88
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2015,Percent of all corn planted,88
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2016,Percent of all corn planted,86
然后将每行读入字典。这个文本文件中有很多行,我想要/需要的唯一行是Variety列的“All GE variety”。请注意,每个州也有多条线路。下一步是使用裁剪的用户输入,并仅检查该裁剪的数据。最后一步是找出(对于每个状态)什么是最大值和最小值及其相应的年份并打印出来。
我想要解决这个问题的方式可能是为每一行创建一个集合,检查“所有GE变种”是否在集合中,如果是,则将其添加到字典中。然后为作物做类似的事情?
我最大的困境可能是1.)我不知道如何忽略不包含“所有GE品种”的行。我在创建字典之前或之后这样做吗? 2.)我知道如何使用一个值和一个键创建一个字典,但我如何将其余的值添加到键中?你是否用套装做到了?还是列表?
答案 0 :(得分:0)
确定“所有GE品种”是否在字符串中相对简单 - 使用in关键字:
with open(datafile, 'r') as infile:
for line in file:
if "All GE varieties" in line:
# put into line into data structure
对于数据结构,我偏向于字典列表,其中每个字典都有一组已定义的键:
myList = [ {}, {}, {}, ... ]
在这种情况下的问题是,如果每个字段都是值,我不确定您将使用什么作为键。还记得split()命令可以帮助:
varieties = []
with open(datafile, 'r') as infile:
for line in file:
if "All GE varieties" in line:
varieties.append(line.split(','))
这将为您提供一个包含列表的列表(变体),每个列都包含每行的单个字段。
这样的事情:
varieties = [['Indiana','Corn','Genetically engineered (GE) corn','All GE varieties','2000','Percent of all corn planted','11'], ['Indiana','Corn','Genetically engineered (GE) corn','All GE varieties','2001','Percent of all corn planted','12'], ... ]
从这里可以很容易地使用切片(2D阵列)挑选州或年等。
答案 1 :(得分:0)
如前所述,您可以使用csv
模块读取csv文件。我不确定您希望如何在state
密钥之后构建数据,但我认为能够查找每个特定crop_title
然后能够访问{可能更好一些每年{1}}。
value
您还可以将In[33]: from collections import defaultdict
...: from csv import reader
...:
...: crops = defaultdict(lambda: defaultdict(dict))
...: with open('hmm.csv', 'r') as csvfile:
...: cropreader = reader(csvfile)
...: for row in cropreader:
...: state, crop_type, crop_title, variety, year, unit, value = row
...: if variety == 'All GE varieties':
...: crops[state][crop_title][year] = value
...:
In[34]: crops
Out[34]:
defaultdict(<function __main__.<lambda>>,
{'Indiana': defaultdict(dict,
{'Genetically engineered (GE) corn': {'2000': '11',
'2001': '12',
'2002': '13',
'2003': '16',
'2004': '21',
'2005': '26',
'2006': '40',
'2007': '59',
'2008': '78',
'2009': '79',
'2010': '83',
'2011': '85',
'2012': '84',
'2013': '85',
'2014': '88',
'2015': '88',
'2016': '86'}})})
In[35]: crops['Indiana']['Genetically engineered (GE) corn']['2000']
Out[35]: '11'
In[36]: crops['Indiana']['Genetically engineered (GE) corn']['2015']
Out[36]: '88'
和year
转换为像value
这样的整数,这样就可以进行这样的调用(返回值为整数):
crops[state][crop_title][int(year)] = int(value)
答案 2 :(得分:0)
我将您的数据放入名为&#34; crop_data.csv&#34;的文件中。这里有一些代码使用标准csv
模块将每行读入自己的字典中。我们使用简单的if
测试来确保我们只将行保留在'Variety' == 'All GE varieties'
,并且我们将每个州的数据存储在all_data
中,这是一个列表字典,每个州一个列表。由于国家名称&#39;被用作all_data
中的密钥,我们不需要将其保留在row
字典中,同样我们也可以丢弃“品种”,因为我们不会将其删除。不再需要那些信息。
收集完所有数据后,我们可以使用json
模块很好地打印它。
然后我们循环遍历all_data
,逐州陈述,并计算其最大值和最小值。
import csv
from collections import defaultdict
import json
filename = 'crop_data.csv'
fieldnames = 'Name,Crop,Title,Variety,Year,Unit,Value'.split(',')
all_data = defaultdict(list)
with open(filename) as csvfile:
reader = csv.DictReader(csvfile, fieldnames=fieldnames)
for row in reader:
# We only want 'All GE varieties'
if row['Variety'] == 'All GE varieties':
state = row['Name']
# Get rid of unneeded fields
del row['Name'], row['Variety']
# Store it as a plain dict
all_data[state].append(dict(row))
# Show all the data
print(json.dumps(all_data, indent=4))
#Find minimums & maximums
# Extract the 'Value' field from dict d and convert it to a number
def value_key(d):
return int(d['Value'])
for state, data in all_data.items():
print(state)
row = min(data, key=value_key)
print('min', row['Value'], row['Year'])
row = max(data, key=value_key)
print('max', row['Value'], row['Year'])
<强>输出强>
{
"Indiana": [
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2000",
"Unit": "Percent of all corn planted",
"Value": "11"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2001",
"Unit": "Percent of all corn planted",
"Value": "12"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2002",
"Unit": "Percent of all corn planted",
"Value": "13"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2003",
"Unit": "Percent of all corn planted",
"Value": "16"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2004",
"Unit": "Percent of all corn planted",
"Value": "21"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2005",
"Unit": "Percent of all corn planted",
"Value": "26"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2006",
"Unit": "Percent of all corn planted",
"Value": "40"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2007",
"Unit": "Percent of all corn planted",
"Value": "59"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2008",
"Unit": "Percent of all corn planted",
"Value": "78"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2009",
"Unit": "Percent of all corn planted",
"Value": "79"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2010",
"Unit": "Percent of all corn planted",
"Value": "83"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2011",
"Unit": "Percent of all corn planted",
"Value": "85"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2012",
"Unit": "Percent of all corn planted",
"Value": "84"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2013",
"Unit": "Percent of all corn planted",
"Value": "85"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2014",
"Unit": "Percent of all corn planted",
"Value": "88"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2015",
"Unit": "Percent of all corn planted",
"Value": "88"
},
{
"Crop": "Corn",
"Title": "Genetically engineered (GE) corn",
"Year": "2016",
"Unit": "Percent of all corn planted",
"Value": "86"
}
]
}
Indiana
min 11 2000
max 88 2014
请注意,在这个数据中,有2年的值为88.如果你想按年份打破关系,你可以使用比value_key
更高级的关键功能。或者,您可以使用value_key
对整个州data
列表进行排序,这样您就可以轻松提取所有最低和最高记录。例如,在for state, data
循环中执行
data.sort(key=value_key)
print(json.dumps(data, indent=4))
它将按数字顺序打印该状态的所有记录。