如何使用python在csv中提取列和行

时间:2011-04-11 12:25:10

标签: python csv

我在file.csv中有这个输入

"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54

我想写一个简单的程序,找到降雨量最少的城市,在这种情况下是密苏里州。我怎么能用Python csv阅读器呢?

我可以尝试提取项目,但不幸的是文件的第一行必须在那里。 我想要算一些像[密苏里] = 300的东西 算[阿姆斯特丹] = 1212等。这样我就可以做到最小,并参考回打印城市。

请指教。感谢。

5 个答案:

答案 0 :(得分:5)

import csv

def main():
    with open('file.csv', 'rb') as inf:
        data = [(int(row['rainfall']), row['']) for row in csv.DictReader(inf)]

    data.sort()
    print data[0]

if __name__=="__main__":
    main()

返回

(300, 'Missouri')

答案 1 :(得分:1)

执行此操作的一种方法是使用csv模块的DictReader类编写一个函数来提取数据列。 DictReader将自动处理第一行字段名称。然后可以使用内置min()函数来确定列中值最小的项目。

import csv

def csv_extract_col(csvinput, colname, key):
    """ extract a named column from a csv stream into a dictionary
          colname:  name of columm to extract
          key:  name of another columm to use as keys in returned dict
    """
    col = {}
    for row in csv.DictReader(csvinput):
        col[row[key]] = row[colname]
    return col

if __name__=='__main__':
    import StringIO

    csvdata = """\
"","min","max","rainfall","days_clear"  # field name row
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54
"""
    csvfile = StringIO.StringIO(csvdata)

    rainfall = csv_extract_col(csvfile, 'rainfall', '')
    print rainfall
    # {'Amsterdam': '1212', 'LA': '1000', 'Missouri': '300'}

    print min(rainfall.iteritems(), key=lambda r: float(r[1]))
    # ('Missouri', '300')

答案 2 :(得分:0)

import StringIO
import csv

example = """"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54
"""

data_in = StringIO.StringIO(example)
#data_in = open('mycsvdata.csv')

def read_data(data_in):
  reader = csv.reader(data_in)
  cols = []
  results = {}
  for row in reader:
    if not cols:
      cols = row
      continue
    row = [ int(x) if x.lstrip('-').isdigit() else x for x in row ]
    results[row[0]] = dict(zip(cols[1:],row[1:]))
  return results

data = read_data(data_in)

min(data.items(),key=lambda x: x[1].get('rainfall'))

返回

('Missouri', {'max': 10, 'days_clear': 23, 'rainfall': 300, 'min': -2})

答案 3 :(得分:0)

要从文件中读取,您需要删除所有处理字符串的代码:

   reader = csv.reader(open('file.csv', 'rb'))
   rainfall = csv_extract_col(reader, 'rainfall', '')

更新:对不起,它比那更多的工作了。 csv_extract_col的第一个arg将用作csv.DictReader的第一个arg,因此(在这种情况下)它应该是一个打开的文件对象,并且永远不应该是csv.reader实例。见下文:

import csv

### def csv_extract_col(csvinput, colname, key):
### exactly as provided by @martineau

if __name__ == '__main__':
    import sys
    filename, data_col_name, key_col_name = sys.argv[1:4]
    input_file_object = open(filename, 'rb')
    result_dict = csv_extract_col(input_file_object, data_col_name, key_col_name)
    print result_dict
    print min(result_dict.iteritems(), key=lambda r: float(r[1]))

结果:

command-prompt>\python27\python joj_csv.py joj.csv rainfall ""
{'Amsterdam': '1212', 'LA': '1000', 'Missouri': '300'}
('Missouri', '300')

command-prompt>\python27\python joj_csv.py joj.csv days_clear ""
{'Amsterdam': '34', 'LA': '54', 'Missouri': '23'}
('Missouri', '23')

更新2 以回应评论“”“必须有一些我错过的东西..我尝试了...... [看起来像@ martineau的功能]与您定义的上述主要功能。然后在我的shell,我定义了python降雨“”。但它给了我KeyError:'rainfall'“”“

两种可能性:

(1)你错误地修补了一些源代码。检查你的工作。

(2)您的文件没有预期的标题行内容。尝试一些调试例如更改@ martineau的代码,以便您可以插入打印语句等来显示csv.DictReader对标题行的看法:

reader = csv.DictReader(csvinput)
print "fieldnames", reader.fieldnames
assert colname in reader.fieldnames
assert key in reader.fieldnames
for row in reader:

如果您仍然卡住,请向我们展示您的所有代码以及完整的追溯和错误消息 - 编辑您的问题或将其放在pastbin或dropbox上;不要把它写进评论!!

答案 4 :(得分:0)

我的代码中有几个城市拥有相同的最小城市或几个城市具有相同的最大值:

import csv

def minmax_col(filename,key,colname):
    with open(filename,'rb') as csvfile:
        rid = csv.DictReader(csvfile,
                             fieldnames=None,
                             quoting=csv.QUOTE_NONNUMERIC)

        mini = float('inf')
        maxi = float('-inf')
        limin = limax =[]

        for row in rid:
            if row[colname] == maxi:
                limax.append(row[key])
            elif row[colname] > maxi:
                maxi = row[colname]
                limax = [row[key]]

            if row[colname] == mini:
                limin.append(row[key])
            elif row[colname] < mini:
                mini = row[colname]
                limin = [row[key]]

    return (key,(maxi,limax),(mini,limin))



key = 'rainfall'
city,(Ma,liMa),(mi,limi) = minmax_col('filename.csv','',key)
print 'Cities analysed on ' + repr(key) + ' parameter :'
print 'maximum==',Ma,'  cities :',', '.join(liMa)
print 'minimum==',mi,'  cities :',', '.join(limi)

print 

key = 'min'
city,(Ma,liMa),(mi,limi) = minmax_col('filename.csv','',key)
print 'Cities analysed on ' + repr(key) + ' parameter :'
print 'maximum==',Ma,'  cities :',', '.join(liMa)
print 'minimum==',mi,'  cities :',', '.join(limi)

在类似的文件上:

"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"Oslo",-2,8,800,12
"LA",10,20,1000,54
"Kologoro",28,45,1212,1

结果是

Cities analysed according the 'rainfall' parameter :
maximum== 1212.0   cities : Amsterdam, Kologoro
minimum== 300.0   cities : Missouri

Cities analysed according the 'min' parameter :
maximum== 28.0   cities : Kologoro
minimum== -3.0   cities : Amsterdam